Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions
Abstract
This study harnesses the predictive capabilities of machine learning to explore the determinants of obesity within populations from Mexico, Peru, and Colombia, using a Decision Tree algorithm bolstered by 5-fold cross-validation. Our comprehensive analysis of 2111 individuals' lifestyle and physical condition data yielded accuracy, precision, recall, and F1-scores that notably peaked in the third and fifth folds. The findings affirmed the significance of dietary habits and physical activity as substantial predictors of obesity levels. The variability in model performance across the folds underscored the importance of robust cross-validation in enhancing the model's generalizability. This research contributes to the burgeoning field of data science in public health by providing a viable model for obesity prediction and laying the groundwork for targeted health interventions. Our study's insights are pivotal for public health officials and policymakers, serving as a stepping stone towards more sophisticated, data-driven approaches to combating obesity. The study, however, recognizes the inherent limitations of self-reported data and the need for broader datasets that encompass more diverse variables. Future research directions include the analysis of longitudinal data to establish causal relationships and the comparison of various machine learning models to optimize predictive performance
Downloads
References
A. Fitria and H. Azis, “Analisis Kinerja Sistem Klasifikasi Skripsi menggunakan Metode Naïve Bayes Classifier,” Pros. Semin. Nas. Ilmu Komput. dan Teknol. Inf., vol. 3, no. 2, pp. 102–106, 2018.
M. M. Baharuddin, T. Hasanuddin, and H. Azis, “Analisis Performa Metode K-Nearest Neighbor untuk Identifikasi Jenis Kaca,” Ilk. J. Ilm., vol. 11, no. 28, pp. 269–274, 2019, doi: 10.33096/ilkom.v11i3.489.269-274.
H. Azis, F. Fattah, and P. Putri, “Performa Klasifikasi K-NN dan Cross-validation pada Data Pasien Pengidap Penyakit Jantung,” Ilk. J. Ilm., vol. 12, no. 2, pp. 81–86, 2020, doi: 10.33096/ilkom.v12i2.507.81-86.
H. Azis, F. T. Admojo, and E. Susanti, “Analisis Perbandingan Performa Metode Klasifikasi pada Dataset Multiclass Citra Busur Panah,” Techno.Com, vol. 19, no. 3, 2020, doi: 10.33633/tc.v19i3.3646.
A. Nurul, Y. Salim, and H. Azis, “Analisis performa metode Gaussian Naïve Bayes untuk klasifikasi citra tulisan tangan karakter arab,” Indones. J. Data Sci., vol. 3, no. 3, pp. 115–121, 2022, doi: 10.56705/ijodas.v3i3.54.
T. E. Tarigan, E. Susanti, M. I. Siami, I. Arfiani, and ..., “Performance Metrics of AdaBoost and Random Forest in Multi-Class Eye Disease Identification: An Imbalanced Dataset Approach,” … Artif. Intell. …, 2023, doi: 10.56705/ijaimi.v1i2.98.
N. Rismayanti, A. Naswin, U. Zaky, M. Zakariyah, and D. A. Purnamasari, “Evaluating Thresholding-Based Segmentation and Humoment Feature Extraction in Acute Lymphoblastic Leukemia Classification using Gaussian Naive Bayes,” Int. J. Artif. Intell. Med. Issues, vol. 1, no. 2, 2023, doi: 10.56705/ijaimi.v1i2.99.
A. Naswin and A. P. Wibowo, “Performance Analysis of the Decision Tree Classification Algorithm on the Pneumonia Dataset,” … Artif. Intell. Med. …, 2023, doi: 10.56705/ijaimi.v1i1.83.
F. T. Admojo and B. S. W. Poetro, “Comparative Study on the Performance of the Bagging Algorithm in the Breast Cancer Dataset,” … Artif. Intell. Med. …, 2023, doi: 10.56705/ijaimi.v1i1.87.
A. Tuppad and S. D. Patil, “Data Pre-processing Issues in Medical Data Classification,” 2023 Int. Conf. …, 2023, doi: 10.1109/NMITCON58196.2023.10275855.
G. Ketepalli and P. Bulla, “Data Preparation and Pre-processing of Intrusion Detection Datasets using Machine Learning,” 2023 Int. Conf. …, 2023, doi: 10.1109/ICICT57646.2023.10134025.
J. Zhao, K. S. Chong, W. Shu, and ..., “A Data Pre-Processing Module for Improved-Accuracy Machine-Learning-based Micro-Single-Event-Latchup Detection,” 2023 IEEE 9th Int. …, 2023, doi: 10.1109/SMC-IT56444.2023.00009.
B. D. Finley, Optimizing Data Pre-Processing Transformations with Reinforcement Learning. search.proquest.com, 2022, doi: 10.3390/a17010037.
N. Rezova, L. Kazakovtsev, G. Shkaberina, and ..., “Data Pre-Processing for Ecosystem Behavior Analysis,” 2022 Int. …, 2022, doi: 10.1109/InfoTech55606.2022.9897105.
P. S. Kumar, “Classification of skin cancer using convolutional neural network in comparison with decision tree classifier,” AIP Conf. Proc., vol. 2822, no. 1, 2023, doi: 10.1063/5.0173035.
M. Bhattacharya, “Diabetes Prediction using Logistic Regression and Rule Extraction from Decision Tree and Random Forest Classifiers,” 2023 4th Int. Conf. Emerg. Technol. INCET 2023, 2023, doi: 10.1109/INCET57972.2023.10170270.
T. R. Sahoo, “Decision tree classifier based on topological characteristics of subgraph for the mining of protein complexes from large scale PPI networks,” Comput. Biol. Chem., vol. 106, 2023, doi: 10.1016/j.compbiolchem.2023.107935.
A. Anitha, “Disease prediction and knowledge extraction in banana crop cultivation using decision tree classifiers,” Int. J. Bus. Intell. Data Min., vol. 20, no. 1, pp. 107–120, 2022, doi: 10.1504/IJBIDM.2022.119957.
J. A. D. de Jesus Ferreira, “Decision tree classifiers for unmanned aircraft configuration selection,” Aircr. Eng. Aerosp. Technol., vol. 93, no. 6, pp. 1122–1132, 2021, doi: 10.1108/AEAT-03-2021-0074.
G. Sajiv, “Machine Learning based Analysis of Histopathological Images of Breast Cancer Classification using Decision Tree Classifier,” 6th Int. Conf. I-SMAC (IoT Soc. Mobile, Anal. Cloud), I-SMAC 2022 - Proc., pp. 989–995, 2022, doi: 10.1109/I-SMAC55078.2022.9987276.
H. Azis and S. R. Jabir, “Chemical Composition and Aroma Profiling: Decision Tree Modeling of Formalin Tofu,” J. Embed. Syst. Secur. …, 2023.
M. Rafało, “Cross validation methods: Analysis based on diagnostics of thyroid cancer metastasis,” ICT Express, vol. 8, no. 2, pp. 183–188, 2022, doi: 10.1016/j.icte.2021.05.001.
K. M. Bain, “Cross-validation of three Advanced Clinical Solutions performance validity tests: Examining combinations of measures to maximize classification of invalid performance,” Appl. Neuropsychol., vol. 28, no. 1, pp. 24–34, 2021, doi: 10.1080/23279095.2019.1585352.
M. Stusek, “Accuracy Assessment and Cross-Validation of LPWAN Propagation Models in Urban Scenarios,” IEEE Access, vol. 8, pp. 154625–154636, 2020, doi: 10.1109/ACCESS.2020.3016042.
O. Karal, “Performance comparison of different kernel functions in SVM for different k value in k-fold cross-validation,” Proc. - 2020 Innov. Intell. Syst. Appl. Conf. ASYU 2020, 2020, doi: 10.1109/ASYU50717.2020.9259880.
T. R. Mahesh, “AdaBoost Ensemble Methods Using K-Fold Cross Validation for Survivability with the Early Detection of Heart Disease,” Comput. Intell. Neurosci., vol. 2022, 2022, doi: 10.1155/2022/9005278.
N. Rismayanti and A. P. Utami, “Improving Multi-Class Classification on 5-Celebrity-Faces Dataset using Ensemble Classification Methods,” Indones. J. Data …, 2023, doi: 10.56705/ijodas.v4i2.78.
D. Ratnasari, “Comparison of Performance of Four Distance Metric Algorithms in K-Nearest Neighbor Method on Diabetes Patient Data,” Indones. J. Data Sci., 2023, doi: 10.56705/ijodas.v4i2.71.
F. T. Admojo and S. R. Jabir, “Analisis performa metode Naïve Bayesh Classifier pada Electronic Nose dalam identifikasi formalin pada tahu,” Indones. J. Data …, 2023, doi: 10.56705/ijodas.v4i1.67.
R. F. Syam, “Performance Comparison Analysis of Classifiers on Binary Classification Dataset,” Indones. J. Data Sci., 2023, doi: 10.56705/ijodas.v4i2.77.
R. Setiawan, H. Zein, R. A. Azdy, and ..., “Rice Leaf Disease Classification with Machine Learning: An Approach Using Nu-SVM,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i3.114.
H. Azis, L. Syafie, F. Fattah, and ..., “Unveiling Algorithm Classification Excellence: Exploring Calendula and Coreopsis Flower Datasets with Varied Segmentation Techniques,” 2024 18th Int. …, 2024, doi: 10.1109/IMCOM60618.2024.10418246.

Copyright (c) 2024 Indonesian Journal of Data and Science

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
License and Copyright Agreement
By submitting a manuscript to the Indonesian Journal of Data and Science (IJODAS), the author(s) confirm and agree to the following:
- All co-authors have given their consent to enter into this agreement.
- The submitted manuscript has not been formally published elsewhere, except as an abstract, thesis, or in the context of a lecture, review, or overlay journal.
- The manuscript is not currently under review or consideration by another journal or publisher.
- All authors have approved the manuscript and its submission to IJODAS, and where applicable, have received institutional approval (tacit or explicit) from affiliated organizations.
- The authors have secured appropriate permissions to reproduce any third-party material included in the manuscript that may be under copyright.
- The authors agree to abide by the licensing and copyright terms outlined below.
Copyright Policy
Authors who publish in IJODAS retain the copyright to their work and grant the journal the right of first publication. The published work is simultaneously licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0) , which permits others to share and adapt the work for non-commercial purposes, with proper attribution to the authors and the initial publication in this journal.
Reuse and Distribution
- Authors may enter into separate, additional contractual arrangements for non-exclusive distribution of the journal-published version of the article (e.g., institutional repositories, book chapters), provided there is proper acknowledgment of its initial publication in IJODAS.
- Prior to and during the submission process, we encourage authors to archive preprints and accepted versions of their work on personal websites or institutional repositories. This method supports scholarly communication, visibility, and early citation.
For more details on the terms of the Creative Commons license used by IJODAS, please visit the official license page.