Penerapan Algoritma Naive Bayes untuk Prediksi Risiko Penyakit Stroke Berdasarkan Data Klinis Pasien
Main Article Content
Abstract
This study aims to develop an accurate, interpretable, and effective method for predicting stroke risk using clinical patient data, considering the high incidence of stroke and the limitations of conventional early detection approaches. The Naive Bayes algorithm is applied as a probabilistic classification model using a clinical dataset that includes demographic variables, medical history, lifestyle factors, and medical indicators such as blood pressure, glucose levels, and body mass index. The research stages include data preprocessing, attribute encoding, handling data imbalance, and splitting the dataset into 80% training data and 20% testing data. The training data are used to construct the model, while the testing data are used to evaluate performance on previously unseen cases. Model performance is assessed using accuracy, precision, recall, and F1-score metrics. The results show that the model achieved an accuracy of 70,00%, stroke-class precision of 66.67%, recall of 80,00%, and an F1-score indicating a reasonably good ability to identify stroke cases. These findings confirm that Naive Bayes is capable of predicting stroke risk based on available clinical patterns and highlight its potential for further development through hybrid models or comparisons with other machine learning algorithms.
Article Details

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
Akinwumi, P. O., Ojo, S., Nathaniel, T. I., Wanliss, J., Karunwi, O., Sulaiman, M., & Nathaniel, T. I. (2025). Evaluating machine learning models for stroke prediction based on clinical variables. September. https://doi.org/10.3389/fneur.2025.1668420
Asadi, F., Paghe, A., & Rahimi, M. (2024). The most efficient machine learning algorithms in stroke prediction : A systematic review. April. https://doi.org/10.1002/hsr2.70062
Chakraborty, P., Bandyopadhyay, A., Sahu, P. P., Burman, A., & Mallik, S. (2024). Predicting stroke occurrences : a stacked machine learning approach with feature selection and data preprocessing. BMC Bioinformatics, 1–23. https://doi.org/10.1186/s12859-024-05866-8
Dritsas, E., & Trigka, M. (2022). Stroke Risk Prediction with Machine Learning Techniques. https://doi.org/10.3390/s22134670
Fathmah, S., Kartini, D., Abadi, F., Budiman, I., & Mazdadi, M. I. (2024). Implementation of PPCA Imputation , SMOTE-N Class Balancing in Hepatitis Classification Using Naïve Bayes. 12(2), 169–176. https://doi.org/10.30595/juita.v12i2.21528
Fauziyah, N. J., Rahmania, F., Daniyal, M., & Ayu, N. F. (2024). Analisis dan Optimalisasi Performa Algoritma Gaussian Naive Bayes pada Prediksi Metabolic Syndrome Menggunakan SMOTE. 9(2), 112–122. https://doi.org/10.14421/jiska.2024.9.2.112-122
Hassan, A., Ahmad, S. G., & Ramzan, N. (2024). Predictive modelling and identification of key risk factors for stroke using machine learning. Scientific Reports, 0123456789, 1–23. https://doi.org/10.1038/s41598-024-61665-4
Mathematics, A. (2024). Naflah Faulina , Khoirin Nisa *, and Warsono Abstrak Naive Bayes classification method is a machine learning technique that uses probability and statistics to infer future probabilities from prior experiences known as Bayes ’ Theorem which was developed by the English scientist Reverend Thomas Bayes . Following the theorem classification has been further developed by researchers in machine learning [ 1 ]. Naive Bayes has many advantages , including speed , efficiency , and performance in various classification tasks . For this reason , Naive Bayes is still a popular method in many areas of machine learning , such as text categorization , healthcare diagnosis , and managing system performance [ 2 ]. There is often an uneven distribution among classes in datasets that are used by researches . Unbalanced data occurs when there is a huge disparity in the number of training samples between two classes , with a large number of samples representing the majority class and a small number of samples representing the minority class [ 3 ]. A common problem with imbalanced data is that classification tends to predict the class with a larger data composition . As a result , prediction accuracy is high for the majority class data , while it is poor for the minority class data [ 4 ]. One method to address imbalanced data is through resampling techniques . Resampling is a preprocessing technique that algorithmically equalizes class distributions to improve the imbalance ratio and reduce the effects of imbalanced class distribution in machine learning processes . Resampling techniques can be performed using oversampling , undersampling , and hybrid methods [ 5 ][ 6 ]. The minority class is the target of oversampling , which aims to bring their numbers closer to the majority class by repeatedly sampling from the minority class [ 7 ]. One method that helps to even out data is the Synthetic Minority Oversampling Technique ( SMOTE ). It does this by making up instances of the minority class to obtain statistical parity [ 8 ]. To ensure the dataset is balanced , undersampling lowers the number of observations from the majority class [ 9 ]. Undersampling using Tomek Links involves excluding data from the majority class that has comparable traits [ 10 ]. Hybrid techniques address imbalanced data by combining oversampling and undersampling techniques [ 11 ]. By combining these two techniques , a dataset is expected to avoid excessive information loss , i . e . a negative effect …. 6(2), 98–111. https://doi.org/10.15408/inprime.v6i2.41463
Melnykova, N., Patereha, Y., Skopivskyi, S., & Farion, M. (2025). Machine learning for stroke prediction using imbalanced data. 1–20. https://doi.org/10.1038/s41598-025-01855-w 1
Noor, I., Aslam, A., Mir, A., & Insany, G. P. (2025). Predicting Brain Stroke Risk Using Machine Learning : A Comprehensive Approach to Early Detection and Prevention †. 1–11. https://doi.org/10.3390/engproc2025107123
Riany, A. F., Testiana, G., Informasi, S. S., & Palembang, K. (2023). Penerapan Data Mining untuk Klasifikasi Penyakit Stroke Menggunakan Algoritma Naïve Bayes Sedangkan Provinsi di Indonesia dengan. 9, 42–54. https://doi.org/10.33020/saintekom.v13i1.352
Sabna, E., & Dewi, O. (2025). Prediksi Penyakit Stroke menggunakan Algoritma Decision Tree dan Naïve Bayes. 4(3), 1294–1299. https://doi.org/10.31004/riggs.v4i3.2132
Saleem, M. A., Javeed, A., Akarathanawat, W., Chutinet, A., Suwanwela, N. C., & Kaewplung, P. (2024). An intelligent learning system based on electronic health records for unbiased stroke prediction. 1–14. https://doi.org/10.1038/s41598-024-73570-x 1
Saputra, D., Aziz, A., Alauddin, F., & Azizan, M. (2025). Comparative Analysis of Gaussian Naïve Bayes and Categorical Naïve Bayes Algorithms with Laplace Smoothing in COVID-19 Detection. 5(1), 69–78. https://doi.org/10.54082/jiki.286 Comparative
Treatment, S., Using, P., Selection, F., & Classifiers, M. L. (2022). Stroke Treatment Prediction Using Features Selection Methods and Machine Learning Classifiers. 00, 1–13. https://doi.org/10.1016/j.irbm.2022.02.002