Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions
DOI:
https://doi.org/10.56705/ijodas.v5i1.126Keywords:
Obesity Prediction, Decision Tree, Public Health, Lifestyle Factors, Health InterventionsAbstract
This study harnesses the predictive capabilities of machine learning to explore the determinants of obesity within populations from Mexico, Peru, and Colombia, using a Decision Tree algorithm bolstered by 5-fold cross-validation. Our comprehensive analysis of 2111 individuals' lifestyle and physical condition data yielded accuracy, precision, recall, and F1-scores that notably peaked in the third and fifth folds. The findings affirmed the significance of dietary habits and physical activity as substantial predictors of obesity levels. The variability in model performance across the folds underscored the importance of robust cross-validation in enhancing the model's generalizability. This research contributes to the burgeoning field of data science in public health by providing a viable model for obesity prediction and laying the groundwork for targeted health interventions. Our study's insights are pivotal for public health officials and policymakers, serving as a stepping stone towards more sophisticated, data-driven approaches to combating obesity. The study, however, recognizes the inherent limitations of self-reported data and the need for broader datasets that encompass more diverse variables. Future research directions include the analysis of longitudinal data to establish causal relationships and the comparison of various machine learning models to optimize predictive performance
Downloads
References
A. Fitria and H. Azis, “Analisis Kinerja Sistem Klasifikasi Skripsi menggunakan Metode Naïve Bayes Classifier,” Pros. Semin. Nas. Ilmu Komput. dan Teknol. Inf., vol. 3, no. 2, pp. 102–106, 2018.
M. M. Baharuddin, T. Hasanuddin, and H. Azis, “Analisis Performa Metode K-Nearest Neighbor untuk Identifikasi Jenis Kaca,” Ilk. J. Ilm., vol. 11, no. 28, pp. 269–274, 2019, doi: 10.33096/ilkom.v11i3.489.269-274.
H. Azis, F. Fattah, and P. Putri, “Performa Klasifikasi K-NN dan Cross-validation pada Data Pasien Pengidap Penyakit Jantung,” Ilk. J. Ilm., vol. 12, no. 2, pp. 81–86, 2020, doi: 10.33096/ilkom.v12i2.507.81-86.
H. Azis, F. T. Admojo, and E. Susanti, “Analisis Perbandingan Performa Metode Klasifikasi pada Dataset Multiclass Citra Busur Panah,” Techno.Com, vol. 19, no. 3, 2020, doi: 10.33633/tc.v19i3.3646.
A. Nurul, Y. Salim, and H. Azis, “Analisis performa metode Gaussian Naïve Bayes untuk klasifikasi citra tulisan tangan karakter arab,” Indones. J. Data Sci., vol. 3, no. 3, pp. 115–121, 2022, doi: 10.56705/ijodas.v3i3.54.
T. E. Tarigan, E. Susanti, M. I. Siami, I. Arfiani, and ..., “Performance Metrics of AdaBoost and Random Forest in Multi-Class Eye Disease Identification: An Imbalanced Dataset Approach,” … Artif. Intell. …, 2023, doi: 10.56705/ijaimi.v1i2.98.
N. Rismayanti, A. Naswin, U. Zaky, M. Zakariyah, and D. A. Purnamasari, “Evaluating Thresholding-Based Segmentation and Humoment Feature Extraction in Acute Lymphoblastic Leukemia Classification using Gaussian Naive Bayes,” Int. J. Artif. Intell. Med. Issues, vol. 1, no. 2, 2023, doi: 10.56705/ijaimi.v1i2.99.
A. Naswin and A. P. Wibowo, “Performance Analysis of the Decision Tree Classification Algorithm on the Pneumonia Dataset,” … Artif. Intell. Med. …, 2023, doi: 10.56705/ijaimi.v1i1.83.
F. T. Admojo and B. S. W. Poetro, “Comparative Study on the Performance of the Bagging Algorithm in the Breast Cancer Dataset,” … Artif. Intell. Med. …, 2023, doi: 10.56705/ijaimi.v1i1.87.
A. Tuppad and S. D. Patil, “Data Pre-processing Issues in Medical Data Classification,” 2023 Int. Conf. …, 2023, doi: 10.1109/NMITCON58196.2023.10275855.
G. Ketepalli and P. Bulla, “Data Preparation and Pre-processing of Intrusion Detection Datasets using Machine Learning,” 2023 Int. Conf. …, 2023, doi: 10.1109/ICICT57646.2023.10134025.
J. Zhao, K. S. Chong, W. Shu, and ..., “A Data Pre-Processing Module for Improved-Accuracy Machine-Learning-based Micro-Single-Event-Latchup Detection,” 2023 IEEE 9th Int. …, 2023, doi: 10.1109/SMC-IT56444.2023.00009.
B. D. Finley, Optimizing Data Pre-Processing Transformations with Reinforcement Learning. search.proquest.com, 2022, doi: 10.3390/a17010037.
N. Rezova, L. Kazakovtsev, G. Shkaberina, and ..., “Data Pre-Processing for Ecosystem Behavior Analysis,” 2022 Int. …, 2022, doi: 10.1109/InfoTech55606.2022.9897105.
P. S. Kumar, “Classification of skin cancer using convolutional neural network in comparison with decision tree classifier,” AIP Conf. Proc., vol. 2822, no. 1, 2023, doi: 10.1063/5.0173035.
M. Bhattacharya, “Diabetes Prediction using Logistic Regression and Rule Extraction from Decision Tree and Random Forest Classifiers,” 2023 4th Int. Conf. Emerg. Technol. INCET 2023, 2023, doi: 10.1109/INCET57972.2023.10170270.
T. R. Sahoo, “Decision tree classifier based on topological characteristics of subgraph for the mining of protein complexes from large scale PPI networks,” Comput. Biol. Chem., vol. 106, 2023, doi: 10.1016/j.compbiolchem.2023.107935.
A. Anitha, “Disease prediction and knowledge extraction in banana crop cultivation using decision tree classifiers,” Int. J. Bus. Intell. Data Min., vol. 20, no. 1, pp. 107–120, 2022, doi: 10.1504/IJBIDM.2022.119957.
J. A. D. de Jesus Ferreira, “Decision tree classifiers for unmanned aircraft configuration selection,” Aircr. Eng. Aerosp. Technol., vol. 93, no. 6, pp. 1122–1132, 2021, doi: 10.1108/AEAT-03-2021-0074.
G. Sajiv, “Machine Learning based Analysis of Histopathological Images of Breast Cancer Classification using Decision Tree Classifier,” 6th Int. Conf. I-SMAC (IoT Soc. Mobile, Anal. Cloud), I-SMAC 2022 - Proc., pp. 989–995, 2022, doi: 10.1109/I-SMAC55078.2022.9987276.
H. Azis and S. R. Jabir, “Chemical Composition and Aroma Profiling: Decision Tree Modeling of Formalin Tofu,” J. Embed. Syst. Secur. …, 2023.
M. Rafało, “Cross validation methods: Analysis based on diagnostics of thyroid cancer metastasis,” ICT Express, vol. 8, no. 2, pp. 183–188, 2022, doi: 10.1016/j.icte.2021.05.001.
K. M. Bain, “Cross-validation of three Advanced Clinical Solutions performance validity tests: Examining combinations of measures to maximize classification of invalid performance,” Appl. Neuropsychol., vol. 28, no. 1, pp. 24–34, 2021, doi: 10.1080/23279095.2019.1585352.
M. Stusek, “Accuracy Assessment and Cross-Validation of LPWAN Propagation Models in Urban Scenarios,” IEEE Access, vol. 8, pp. 154625–154636, 2020, doi: 10.1109/ACCESS.2020.3016042.
O. Karal, “Performance comparison of different kernel functions in SVM for different k value in k-fold cross-validation,” Proc. - 2020 Innov. Intell. Syst. Appl. Conf. ASYU 2020, 2020, doi: 10.1109/ASYU50717.2020.9259880.
T. R. Mahesh, “AdaBoost Ensemble Methods Using K-Fold Cross Validation for Survivability with the Early Detection of Heart Disease,” Comput. Intell. Neurosci., vol. 2022, 2022, doi: 10.1155/2022/9005278.
N. Rismayanti and A. P. Utami, “Improving Multi-Class Classification on 5-Celebrity-Faces Dataset using Ensemble Classification Methods,” Indones. J. Data …, 2023, doi: 10.56705/ijodas.v4i2.78.
D. Ratnasari, “Comparison of Performance of Four Distance Metric Algorithms in K-Nearest Neighbor Method on Diabetes Patient Data,” Indones. J. Data Sci., 2023, doi: 10.56705/ijodas.v4i2.71.
F. T. Admojo and S. R. Jabir, “Analisis performa metode Naïve Bayesh Classifier pada Electronic Nose dalam identifikasi formalin pada tahu,” Indones. J. Data …, 2023, doi: 10.56705/ijodas.v4i1.67.
R. F. Syam, “Performance Comparison Analysis of Classifiers on Binary Classification Dataset,” Indones. J. Data Sci., 2023, doi: 10.56705/ijodas.v4i2.77.
R. Setiawan, H. Zein, R. A. Azdy, and ..., “Rice Leaf Disease Classification with Machine Learning: An Approach Using Nu-SVM,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i3.114.
H. Azis, L. Syafie, F. Fattah, and ..., “Unveiling Algorithm Classification Excellence: Exploring Calendula and Coreopsis Flower Datasets with Varied Segmentation Techniques,” 2024 18th Int. …, 2024, doi: 10.1109/IMCOM60618.2024.10418246.
Downloads
Published
Issue
Section
License
Authors retain copyright and full publishing rights to their articles. Upon acceptance, authors grant Indonesian Journal of Data and Science a non-exclusive license to publish the work and to identify itself as the original publisher.
Self-archiving. Authors may deposit the submitted version, accepted manuscript, and version of record in institutional or subject repositories, with citation to the published article and a link to the version of record on the journal website.
Commercial permissions. Uses intended for commercial advantage or monetary compensation are not permitted under CC BY-NC 4.0. For permissions, contact the editorial office at ijodas.journal@gmail.com.
Legacy notice. Some earlier PDFs may display “Copyright © [Journal Name]” or only a CC BY-NC logo without the full license text. To ensure clarity, the authors maintain copyright, and all articles are distributed under CC BY-NC 4.0. Where any discrepancy exists, this policy and the article landing-page license statement prevail.










