Obesity Prediction with Machine Learning Models Comparing Various Algorithm Performances

  • Yudha Islami Sulistya Telkom University
  • Maie Istighosah Telkom University

Keywords: obesity prediction, machine learning, ensemble method, healthcare planning

Abstract

Obesity poses a significant global health risk due to its links to conditions such as diabetes, cardiovascular disease, and various cancers, underscoring the need for early prediction to enable timely intervention. This study evaluated the performance of seven machine learning algorithms—Logistic Regression, Decision Tree, Random Forest, ExtraTrees, Gradient Boosting, AdaBoost, and XGBoost—in predicting obesity using health and lifestyle data. The models were assessed based on accuracy, precision, recall, and F1-score, with hyperparameter tuning applied for optimization. The results confirmed that the ExtraTrees Classifier was the best performer, achieving an accuracy of 92.6%, precision of 92.7%, recall of 92.8%, and F1-score of 92.7%. Both Random Forest (91.3% accuracy) and XGBoost (89.9% accuracy) also exhibited strong predictive abilities. In contrast, models like Logistic Regression (74.3% accuracy) and AdaBoost (73.0% accuracy) showed lower effectiveness, emphasizing the advantages of ensemble methods such as ExtraTrees in delivering accurate obesity predictions. These findings suggest that ensemble models provide a promising approach for early diagnosis and targeted healthcare interventions.

References

M. Safaei, E. A. Sundararajan, M. Driss, W. Boulila, and A. Shapi’i, “A systematic literature review on obesity: Understanding the causes & consequences of obesity and reviewing various machine learning approaches used to predict obesity,” Sep. 01, 2021, Elsevier Ltd. doi: 10.1016/j.compbiomed.2021.104754.

B. Bonnechère, A. Cuevas-Sierra, J. Jeon, S. Lee, and C. Oh, “Age-specific risk factors for the prediction of obesity using a machine learning approach.” [Online]. Available: https://knhanes.kdca.go.kr/knhanes

W. Lin, S. Shi, H. Huang, J. Wen, and G. Chen, “Predicting risk of obesity in overweight adults using interpretable machine learning algorithms,” Front Endocrinol (Lausanne), vol. 14, 2023, doi: 10.3389/fendo.2023.1292167.

K. Fujihara et al., “Machine learning approach to predict body weight in adults,” Front Public Health, vol. 11, 2023, doi: 10.3389/fpubh.2023.1090146.

X. Pang, C. B. Forrest, F. Lê-Scherban, and A. J. Masino, “Prediction of early childhood obesity with machine learning and electronic health record data,” Int J Med Inform, vol. 150, Jun. 2021, doi: 10.1016/j.ijmedinf.2021.104454.

A. Fernandes, S. Dahikar, K. Chopra, and K. Saxena, “Comparison of Machine Learning Algorithms for Obesity Prediction,” in 2023 3rd Asian Conference on Innovation in Technology, ASIANCON 2023, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ASIANCON58793.2023.10270246.

E. R. Cheng, R. Steinhardt, and Z. Ben Miled, “Predicting Childhood Obesity Using Machine Learning: Practical Considerations,” BioMedInformatics, vol. 2, no. 1, pp. 184–203, Mar. 2022, doi: 10.3390/biomedinformatics2010012.

J. Dunstan, M. Aguirre, M. Bastías, C. Nau, T. A. Glass, and F. Tobar, “Predicting nationwide obesity from food sales using machine learning,” Health Informatics J, vol. 26, no. 1, pp. 652–663, Mar. 2020, doi: 10.1177/1460458219845959.

F. Musa, F. Basaky, and O. E.O, “Obesity prediction using machine learning techniques,” Journal of Applied Artificial Intelligence, vol. 3, no. 1, pp. 24–33, Jun. 2022, doi: 10.48185/jaai.v3i1.470.

Y. Bao and S. Yang, “Two Novel SMOTE Methods for Solving Imbalanced Classification Problems,” IEEE Access, vol. 11, pp. 5816–5823, 2023, doi: 10.1109/ACCESS.2023.3236794.

K. Kim, “Noise Avoidance SMOTE in Ensemble Learning for Imbalanced Data,” IEEE Access, vol. 9, pp. 143250–143265, 2021, doi: 10.1109/ACCESS.2021.3120738.

G. A. Pradipta, R. Wardoyo, A. Musdholifah, and I. N. H. Sanjaya, “Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning from Imbalanced Data,” IEEE Access, vol. 9, pp. 74763–74777, 2021, doi: 10.1109/ACCESS.2021.3080316.

C. Wang et al., “Prediction of the Consolidation Coefficient of Soft Soil Based on Machine Learning Models,” Soil Mechanics and Foundation Engineering, vol. 61, no. 3, pp. 223–229, Jul. 2024, doi: 10.1007/s11204-024-09966-8.

T. Nie, M. Zhao, Z. Zhu, K. Zhao, and Z. Wang, “Estimating feature importance in circuit network using machine learning,” Multimed Tools Appl, vol. 83, no. 11, pp. 31233–31249, Mar. 2024, doi: 10.1007/s11042-023-16814-8.

D. Elreedy, A. F. Atiya, and F. Kamalov, “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Mach Learn, vol. 113, no. 7, pp. 4903–4923, Jul. 2024, doi: 10.1007/s10994-022-06296-4.

Y. He, X. Lu, P. Fournier-Viger, and J. Z. Huang, “A novel overlapping minimization SMOTE algorithm for imbalanced classification,” Frontiers of Information Technology and Electronic Engineering, Sep. 2024, doi: 10.1631/FITEE.2300278.

H. Liu, X. Li, F. Chen, W. Härdle, and H. Liang, “A comprehensive comparison of goodness-of-fit tests for logistic regression models,” Stat Comput, vol. 34, no. 5, Oct. 2024, doi: 10.1007/s11222-024-10487-5.

K. Nishiura, E. H. Choi, E. Choi, and O. Mizuno, “Two improving approaches for faulty interaction localization using logistic regression analysis,” Software Quality Journal, vol. 32, no. 3, pp. 1039–1073, Sep. 2024, doi: 10.1007/s11219-024-09677-1.

A. J. Albert, R. Murugan, and T. Sripriya, “Diagnosis of heart disease using oversampling methods and decision tree classifier in cardiology,” Research on Biomedical Engineering, vol. 39, no. 1, pp. 99–113, Mar. 2023, doi: 10.1007/s42600-022-00253-9.

M. Li et al., “Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm,” IEEE/ACM Trans Comput Biol Bioinform, vol. 19, no. 6, pp. 3646–3654, Nov. 2022, doi: 10.1109/TCBB.2021.3123269.

N. S. Thomas and S. Kaliraj, “An Improved and Optimized Random Forest Based Approach to Predict the Software Faults,” SN Comput Sci, vol. 5, no. 5, Jun. 2024, doi: 10.1007/s42979-024-02764-x.

M. G. Lanjewar, J. S. Parab, A. Y. Shaikh, and M. Sequeira, “CNN with machine learning approaches using ExtraTreesClassifier and MRMR feature selection techniques to detect liver diseases on cloud,” Cluster Comput, vol. 26, no. 6, pp. 3657–3672, Dec. 2023, doi: 10.1007/s10586-022-03752-7.

S. Kaushik, M. Balachandra, D. Olivia, and Z. Khan, “Unveiling the epilepsy enigma: an agile and optimal machine learning approach for detecting inter-ictal state from electroencephalogram signals,” International Journal of Information Technology (Singapore), 2024, doi: 10.1007/s41870-024-02078-4.

I. Colakovic and S. Karakatič, “Adaptive Boosting Method for Mitigating Ethnicity and Age Group Unfairness,” SN Comput Sci, vol. 5, no. 1, Jan. 2024, doi: 10.1007/s42979-023-02342-7.

H. T. Wu, Z. L. Zhang, and D. Dias, “Prediction on compression indicators of clay soils using XGBoost with Bayesian optimization,” J Cent South Univ, 2024, doi: 10.1007/s11771-024-5681-9.

Published
2025-05-30