Klasifikasi Tipe Konsumen Retail Supermarket Menggunakan Decision Tree Berdasarkan Data Transaksi
Main Article Content
Abstract
The goal of this study is to use a Decision Tree algorithm to group different types of shoppers in a retail supermarket setting. The dataset used in this study came from a public Kaggle repository and has 1,000 transaction records. These records include information about the customers, the types of products they bought, the details of the transactions, and the time of day they took place. We used Python-based libraries like pandas and scikit-learn in Google Colab to process and analyze the data. The research method includes preprocessing the data, creating new features, encoding categorical variables, splitting the data, training the model, and evaluating it. We used accuracy, precision, recall, and F1-score to measure how well the model worked. The experimental results show that the Decision Tree model was about 56.5% accurate when using all features and 61.5% accurate when using only a few key features with the tree depth set to a certain level. These results show that it's possible to group customers based on their transactional and demographic information, but the model's performance could still be better. This study helps us understand how to segment customers in retail analytics and gives us a starting point for future improvements using more advanced machine learning methods.
Article Details

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
Ahmed, A., Rızaner, A., & Ulusoy, A. (2018). A novel decision tree classification based on post-pruning with Bayes minimum risk. PLOS ONE, 13(4), e0194168. https://doi.org/10.1371/journal.pone.0194168
Almagribi, A. (2025). Clustering and classification of retail sales data: A big data and data mining analysis. Journal of Innovations in Computer Science, 4(2), 242–253. https://doi.org/10.56347/jics.v4i2.303
Cant, M., & Toit, M. (2012). Identifying the factors that influence retail customer loyalty and capitalising them. International Business & Economics Research Journal (IBER), 11(11), 1223. https://doi.org/10.19030/iber.v11i11.7370
Chang, J., Travaglione, A., & O’Neill, G. (2015). How can gender signal employee qualities in retailing? Journal of Retailing and Consumer Services, 27, 24–30. https://doi.org/10.1016/j.jretconser.2015.07.004
Chen, J. (2024). Advanced analytics for retail inventory and demand forecasting. TEBMR, 10, 113–119. https://doi.org/10.62051/jme9b319
Dixon, L., Li, J., Sorensen, J., Thain, N., & Vasserman, L. (2018). Measuring and mitigating unintended bias in text classification. Proceedings of the ACM Conference, 67–73. https://doi.org/10.1145/3278721.3278729
Gunawan, I., & Setiawan, T. (2023). Analisis regresi linier dalam memprediksi data penjualan supermarket. Jurnal Saintikom (Jurnal Sains Manajemen Informatika dan Komputer), 22(1), 198. https://doi.org/10.53513/jis.v22i1.7556
He, W., & Zeng, Q. (2021). Research on sales forecast based on XGBoost-LSTM algorithm model. Journal of Physics: Conference Series, 1754(1), 012191. https://doi.org/10.1088/1742-6596/1754/1/012191
Jia, S., & Cristianini, N. (2015). Learning to classify gender from four million images. Pattern Recognition Letters, 58, 35–41. https://doi.org/10.1016/j.patrec.2015.02.006
Liu, H. (2024). Comparative analysis of machine learning algorithms for sales forecasting in the Russian toy retail sector. Advances in Economics, Management and Political Sciences, 128(1), 180–187. https://doi.org/10.54254/2754-1169/2024.18672
Mansur, S., Sattar, K., Hosseini, S., Pervez, S., Ahmad, I., Saleem, K., & Elhendi, A. (2025). Sales forecasting for retail stores using hybrid neural networks and sales-affecting variables. PeerJ Computer Science, 11, e3058. https://doi.org/10.7717/peerj-cs.3058
Mukhlisin, M., & Nugroho, H. (2025). Customer loyalty classification using KNN and decision tree for sales strategy development. Sinkron, 9(3), 1159–1166. https://doi.org/10.33395/sinkron.v9i3.15110
Mühlbacher, T., Linhardt, L., Möller, T., & Piringer, H. (2018). TreePOD: Sensitivity-aware selection of Pareto-optimal decision trees. IEEE Transactions on Visualization and Computer Graphics, 24(1), 174–183. https://doi.org/10.1109/TVCG.2017.2745158
Phillips, N., Neth, H., Woike, J., & Gaissmaier, W. (2017). FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. Judgment and Decision Making, 12(4), 344–368.https://doi.org/10.1017/S1930297500006239
Safavian, S. R., & Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21(3), 660–674. https://doi.org/10.1109/21.97458
Sousa, A., Moro, S., & Pereira, R. (2023). Cluster-based approaches toward developing a customer loyalty program in a private security company. Applied Sciences, 14(1), 78. https://doi.org/10.3390/app14010078
Vera-Salmerón, E., Domínguez-Nogueira, C., Sáez, J., Romero-Béjar, J., & Mota-Romero, E. (2024). Differentiating pressure ulcer risk levels through interpretable classification models based on readily measurable indicators. Healthcare, 12(9), 913. https://doi.org/10.3390/healthcare12090913
Wen, K., Joseph, M., & Sivakumar, V. (2024). Big Mart sales prediction using machine learning. EAI Endorsed Transactions on Internet of Things, 10. https://doi.org/10.4108/eetiot.6453