Grid Search Hyperparameter Analysis in Optimizing The Decision Tree Method for Diabetes Prediction

  • Desi Anggreani Universitas Muhammadiyah Makassar
  • Hamdani Universitas Muhammadiyah Makassar
  • Nurmisba Universitas Muhammadiyah Makassar
  • Lukman Universitas Muhammadiyah Makassar

Keywords: Diabetes, Decision Tree, Hyperparameter Optimization, Grid Search

Abstract

Diabetes is a global health issue that continues to rise, especially in Indonesia, caused by unhealthy lifestyles, poor diets, and genetic factors. Early detection of diabetes risk is crucial to prevent serious complications, and machine learning offers innovative predictive solutions. This research focuses on the development of a diabetes risk prediction model using the Decision Tree algorithm with hyperparameter optimization through the Grid Search technique. The research methodology includes the collection of patient medical data with key attributes such as glucose levels, blood pressure, skin health, insulin, body mass index (BMI), diabetes pedigree, age, and health history. The hyperparameter tuning process is carried out by varying key parameters such as the maximum tree depth (max_depth), the minimum number of samples required to split a node (min_samples_split), and the minimum number of samples required at a leaf node (min_samples_leaf). Grid Search is used to systematically explore hyperparameter combinations in order to find the optimal configuration that can improve the model's performance. The research process includes data preprocessing, splitting the dataset into training and testing sets, model training, and evaluation using accuracy metrics, confusion matrix, and ROC AUC curve. The initial results show a model accuracy of 76%, which was then improved to 81% after hyperparameter optimization using Grid Search. The visualization of the decision tree reveals that glucose levels and BMI have the most significant contributions in predicting diabetes risk. This research demonstrates the potential of machine learning in supporting the early detection of diabetes, with the Decision Tree algorithm showing promising predictive capabilities. Nevertheless, further research with larger datasets and the integration of other algorithms is highly recommended to improve the accuracy and generalization of the model. The main contribution of this research is the development of a machine learning-based approach that can assist medical personnel in screening for diabetes risk more efficiently and accurately.

Downloads

Download data is not yet available.

References

D. Anggreani, I. A. E. Zaeni, A. N. Handayani, H. Azis and A. R. Manga’, "Multivariate Data Model Prediction Analysis Using Backpropagation Neural Network Method," 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), Surabaya, Indonesia, 2021, pp. 239-243, doi: 10.1109/EIConCIT50028.2021.9431879.

D. Anggreani, Nurmisba, D. Setiawan, and Lukman, "Optimization of K-Means Clustering Method by Using Elbow Method in Predicting Blood Requirement of Pelamonia Hospital Makassar," Internet of Things and Artificial Intelligence Journal, vol. 4, no. 3, pp. 541–550, Aug. 2024, doi: 10.31763/iota.v4i3.755.

International Diabetes Federation (IDF), "Diabetes facts and figures," 2021. Available: https://idf.org/about-diabetes/diabetes-facts-figures/. [Accessed: Nov. 30, 2024].

World Health Organization, “Global Report on Diabetes,” 2022. Available: https://apps.who.int/iris/bitstream/handle/10665/204871/9789241565257_eng.pdf. [Accessed: Nov. 30, 2024].

Directorate of Non-Communicable Disease Prevention and Control, Webinar on World Diabetes Day 2024. [Online]. Available: https://lms.kemkes.go.id/courses/799f17f7-c509-4577-92fb-315a4c7b9983. [Accessed: Nov. 30, 2024].

Lancet Diabetes & Endocrinology, Diabetes Prevalence and Management: A Global Update. [Online]. Available: https://www.thelancet.com/issue/S2213-8587(24)X0012-1. [Accessed: Nov.30,2024].

S. S. Reddy, N. Sethi, and R. Rajender, "A comprehensive analysis of machine learning techniques for incessant prediction of diabetes mellitus," International Journal of Grid and Distributed Computing, vol. 13, no. 1, pp. 1-22, 2020. doi: 10.33832/ijgdc.2020.13.1.01.

R. Qasim, F. Moin, M. Ashraf, A. Khan, B. Sarwar, and A. Liaqat, "Risk factors, prevention, and treatment of type 2 diabetes," International Journal of Health Sciences, vol. 6, no. S6, pp. 8822–8832, 2022. doi: 10.53730/ijhs.v6nS6.12362.

A. Hashmi, M. T. Nafis, S. Naaz, and I. Hussain, "A Machine Learning Approach for Diabetes Prediction in Women," International Journal of Food and Nutritional Science, vol. 11, no. 12, pp. 295–408, Feb. 2024.

T. Gautier, L. B. Ziegler, M. S. Gerber, E. Campos-Náñez, and S. D. Patek, "Artificial intelligence and diabetes technology: A review," Metabolism, vol. 124, p. 154872, Nov. 2021. doi: 10.1016/j.metabol.2021.154872.

H. Tanveer, M. A. Adam, M. A. Khan, M. A. Ali, and A. Shakoor, "Analyzing the performance and efficiency of machine learning algorithms, such as deep learning, decision trees, or support vector machines, on various datasets and applications," The Asian Bulletin of Big Data Management / Data Science, vol. 3, no. 2, 2023. doi: 10.62019/abbdm.v3i2.83.

M. Arifuzuzaman, M. R. Hasan, T. J. Toma, S. B. Hassan, and A. K. Paul, "An advanced decision tree-based deep neural network in nonlinear data classification," Technologies, vol. 11, no. 1, p. 24, Feb. 2023. doi: 10.3390/technologies11010024.

N. Saum, S. Sugiura, and M. Piantanakulchai, "Hyperparameter optimization using iterative decision tree (IDT)," IEEE Access, vol. 10, pp. 3212387, Oct. 2022, doi: 10.1109/ACCESS.2022.3212387.

M. Ahmad, M. A. Ali, M. R. Hasan, F. D. Mobo, and S. I. Rai, "Geospatial Machine Learning and the Power of Python Programming: Libraries, Tools, Applications, and Plugins," in Ethics, Machine Learning, and Python in Geospatial Analysis, IGI Global, 2024, p. 31. doi: 10.4018/979-8-3693-6381-2.

Z. S. Dunias, B. Van Calster, D. Timmerman, A.-L. Boulesteix, and M. van Smeden, "A comparison of hyperparameter tuning procedures for clinical prediction models: A simulation study," Statistics in Medicine, vol. 43, no. 6, pp. 1119–1134, Jan. 2024, doi: 10.1002/sim.9932.

E. O. Paul, "Hybrid decision tree-based machine learning models for diabetes prediction," SCIREA Journal of Information Science and Systems Science, vol. 8, no. 1, Feb. 2024, doi: 10.54647/isss120327.

V.R. Modhugu and S.Ponnusamy,"Comparative analysis of machine learning algorithms for liver disease prediction: SVM, logistic regression, and decision tree," Asian Journal of Research in Computer Science, vol. 17, no. 6, pp. 188–201, 2024, doi: 10.9734/ajrcos/2024/v17i6467.

A. K. Rahimi, O. J. Canfell, W. Chan, B. Sly, J. D. Pole, C. Sullivan, and S. Shrapnel, "Machine learning models for diabetes management in acute care using electronic medical records: A systematic review," International Journal of Medical Informatics, vol. 162, Jun. 2022, Art. no. 104758, doi: 10.1016/j.ijmedinf.2022.104758.

S. Tangirala, "Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm," International Journal of Advanced Computer Science and Applications (IJACSA), vol. 11, no. 2, pp. 295-408, 2020, doi: 10.14569/IJACSA.2020.0110277.

G. S and S. Brindha, "Hyperparameters Optimization using Gridsearch Cross Validation Method for machine learning models in Predicting Diabetes Mellitus Risk," 2022 International Conference on Communication, Computing and Internet of Things (IC3IoT), Chennai, India, 2022, pp. 1-4, doi: 10.1109/IC3IOT53935.2022.9768005.

O. Rahmati, M. Avand, P. Yariyan, J. P. Tiefenbacher, A. Azareh, and D. T. Bui, "Assessment of Gini-, Entropy- and Ratio-Based Classification Trees for Groundwater Potential Modelling and Prediction," Geocarto International, vol. 37, no. 12, pp. 3397–3415, 2021, doi: 10.1080/10106049.2020.1861664.

D. M. Belete and M. D. Huchaiah, "Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results," International Journal of Computers and Applications, 2021, doi: 10.1080/1206212X.2021.1974663.

Published
2024-12-31
How to Cite
Anggreani, D., Hamdani, Nurmisba, & Lukman. (2024). Grid Search Hyperparameter Analysis in Optimizing The Decision Tree Method for Diabetes Prediction. Indonesian Journal of Data and Science, 5(3), 190-197. https://doi.org/10.56705/ijodas.v5i3.190