Klasifikasi Topik Berita Online Bahasa Indonesia Menggunakan Support Vector Machine Berbasis TF-IDF

Main Article Content

I Gede Angga Pratama
I Wayan Sudiarsa
Dewa Made Artha Wiguna
I Made Dwi Windu Saputra
I Made Teguh Winantara

Abstract

The development of online media has led to a significant increase in the number of online news stories, necessitating an automated method to efficiently group and label news topics. The manual labeling process is time-consuming and resource-intensive and prone to inconsistencies. Therefore, this study aims to implement and evaluate the Support Vector Machine (SVM) algorithm in automating the labeling of Indonesian-language online news topics. This study uses a quantitative approach with an experimental method on an online news dataset that has undergone text cleaning and preprocessing stages, including normalization, tokenization, stopword removal, and advanced text processing. Document feature representation is performed using the Term Frequency–Inverse Document Frequency (TF-IDF) method to convert text data into high-dimensional numeric vectors. A classification model is built using a Support Vector Machine with a linear kernel and implemented using the Scikit-learn library. The dataset is divided into training data and test data with a ratio of 80% and 20% to evaluate model performance objectively. The test results show that the SVM model is able to achieve an accuracy level of 79.38%, with relatively balanced precision, recall, and F1-score values ​​across most topic classes. Confusion matrix analysis shows that most documents were correctly classified, although errors still occurred in topics with similar contexts. The findings of this study indicate that the combination of TF-IDF and Support Vector Machines is effective for classifying Indonesian online news texts and has the potential to support the development of a digital content management system based on topic labeling automation.


 

Article Details

How to Cite
Pratama, I. G. A., Sudiarsa, I. W., Wiguna, D. M. A., Saputra, I. M. D. W., & Winantara, I. M. T. (2026). Klasifikasi Topik Berita Online Bahasa Indonesia Menggunakan Support Vector Machine Berbasis TF-IDF. Journal of Multidisciplinary Inquiry in Science, Technology and Educational Research, 3(1), 1603–1611. https://doi.org/10.32672/mister.v3i1.4130
Section
Articles

References

Arifin, N., Enri, U., & Sulistiyowati, N. (2021). Penerapan support vector machine dengan TF-IDF n-gram untuk text classification. STRING (Satuan Tulisan Riset dan Inovasi Teknologi), 6(2), 120–129. https://doi.org/10.30998/string.v6i2.10125

Dadgar, S. M. H., Araghi, M. S., & Farahani, M. M. (2016). A novel text mining approach based on TF-IDF and support vector machine for news classification. Proceedings of the 2016 IEEE International Conference on Engineering and Technology (ICETECH) (pp. 112–117). IEEE. https://doi.org/10.1109/ICETECH.2016.7569223

Hidayat, S., & Wibowo, A. T. (2021). Analisis kinerja support vector machine pada klasifikasi teks berdimensi tinggi. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(1), 78–85. https://doi.org/10.29207/resti.v5i1.2865

Husain, N. P., Sukirman, S., & Sajiah, S. A. J. (2024). Analisis sentimen ulasan pengguna TikTok menggunakan TF-IDF dan support vector machine. Jurnal Teknologi Informasi dan Ilmu Komputer, 11(1), 45–54. https://doi.org/10.25126/jtiik.20241101

Ionendri, N. A., Candra, F., & Rizal, A. (2025). Online news classification using TF-IDF and natural language processing. Jurnal Aplikasi dan Teori Ilmu Komputer (JACOST), 4(1), 1–10.

Kadhim, A. I. (2019). Term weighting for feature extraction on Twitter: A comparison between TF-IDF and TF-RF. International Journal of Advanced Computer Science and Applications, 10(2), 1–7. https://doi.org/10.14569/IJACSA.2019.0100201

Lestari, V. B. (2025). Evaluation of TF-IDF feature extraction techniques in sentiment analysis. Jurnal Komputer dan Aplikasi, 13(2), 87–95.

Putra, P. R. B., & Fauzi, M. A. (2023). Klasifikasi judul berita online menggunakan metode support vector machine dan TF-IDF. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 7(3), 1345–1352.

Saigal, P. (2020). Multi-category news classification using support vector machine. SN Applied Sciences, 2(3), 1–9. https://doi.org/10.1007/s42452-020-2170-5

Suyanto, S., Prasetyo, E., & Nugroho, A. (2018). Text classification using support vector machine with TF-IDF weighting. International Journal of Artificial Intelligence Research, 2(1), 20–27.

Suroyo, H., & Pramono, A. (2025). Comparison of text representation methods for sentiment classification. Journal of Artificial Intelligence and Information Technology, 9(1), 55–63.

Wang, S., & Manning, C. D. (2017). Baselines and bigrams: Simple, good sentiment and topic classification. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (pp. 90–94). https://doi.org/10.18653/v1/P12-2018

Yang, Y., & Liu, X. (2016). A re-examination of text categorization methods. ACM SIGIR Forum, 50(2), 3–14. https://doi.org/10.1145/2970398.2970401

Zhang, X., Zhao, J., & LeCun, Y. (2017). Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems, 30, 649–657.

Zulfikar, M., & Nugroho, L. E. (2022). Klasifikasi berita hoaks berbahasa Indonesia menggunakan TF-IDF dan support vector machine. Jurnal Nasional Teknik Elektro dan Teknologi Informasi, 11(4), 387–394.