Predicting Cardiovascular Disease Using Machine Learning: A Feature Engineering and Model Comparison Approach

Authors

  • Bagus Satrio Waluyo Poetro Universitas Islam Sultan Agung
  • Dian Hafidh Zulfikar Universitas Islam Negeri Raden Intan Lampung
  • I Made Sunia Raharja Universitas Udayana
  • Nicodemus Mardanus Setiohardjo Politeknik Negeri Kupang

DOI:

https://doi.org/10.56705/ijaimi.v3i2.363

Keywords:

Cardiovascular Disease, Machine Learning, Gradient Boosting, Risk Prediction, Health Informatics

Abstract

Cardiovascular disease (CVD) remains one of the leading causes of mortality globally, emphasizing the need for early detection and effective risk stratification. With the increasing availability of clinical and lifestyle-related health data, machine learning (ML) has become a powerful tool to support data-driven diagnosis and decision-making in healthcare. This study aims to develop and evaluate multiple supervised ML models to predict the presence of cardiovascular disease based on non-invasive features obtained from routine medical checkups. The dataset, comprising 69,301 individual records, includes variables such as age, gender, blood pressure, cholesterol, glucose levels, body measurements, and lifestyle habits. Following comprehensive data cleaning and feature engineering such as the derivation of BMI, Mean Arterial Pressure (MAP), and Pulse Pressure four classifiers were applied: Logistic Regression, Random Forest, Gradient Boosting, and Support Vector Machine (SVM). Model performance was evaluated using metrics including accuracy, precision, recall, F1-score, and ROC-AUC. Among all models tested, the Gradient Boosting Classifier achieved the highest performance, with a ROC-AUC score of 0.8060 and a balanced precision-recall tradeoff, indicating strong discriminatory power. Visualizations such as ROC curves and confusion matrices confirmed the superior capability of Gradient Boosting in differentiating between patients with and without CVD. These findings demonstrate the viability of ML-driven risk assessment models as decision-support tools in clinical settings, potentially aiding in earlier diagnosis and more personalized intervention strategies.

References

P. P. M. Ramya Sri Bhuvana, B. Rohith, B. M. Swathi, G. Nikhitha, D. H. K. Vege, and D. M. M. Subramanyam, “Prediction Of Cardiovascular Disorders Using Machine Learning,” Educ. Adm. Theory Pract., Jun. 2024, doi: 10.53555/kuey.v30i6.5471.

D. Adusumilli, S. L. Damineni, S. Kailasam, N. Tenali, and R. Yadavalli, “Assessment of Cardiovascular Disease Using Machine Learning,” Rev. d’Intelligence Artif., vol. 38, no. 3, pp. 1035–1043, Jun. 2024, doi: 10.18280/ria.380329.

M. Jayaraman and S. Pichai, “Automatic Data-Driven Classification Systems for Cardiovascular Disease,” EAI Endorsed Trans. Pervasive Heal. Technol., vol. 10, Jun. 2024, doi: 10.4108/eetpht.10.6430.

M. K. S. Bansode, “Heart Disease Prediction using Machine Learning,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 12, no. 5, pp. 3124–3128, May 2024, doi: 10.22214/ijraset.2024.62232.

T. Soni, D. Gupta, M. Uppal, and A. Kumari, “Machine Learning in Cardiovascular Disease: Clinical Applications and Relevance to Cardiac Imaging,” in 2024 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS), Jul. 2024, pp. 940–944, doi: 10.1109/ICSCSS60660.2024.10624833.

“Predicting Cardiovascular Diseases Risk in Thai Population by Machine Learning,” Bangkok Med. J., vol. 21, no. 2, Sep. 2025, doi: 10.31524/bkkmedj.2025.21.006.

D.-I. Kasartzian and T. Tsiampalis, “Transforming Cardiovascular Risk Prediction: A Review of Machine Learning and Artificial Intelligence Innovations,” Life, vol. 15, no. 1, p. 94, Jan. 2025, doi: 10.3390/life15010094.

H. Sadr, A. Salari, M. T. Ashoobi, and M. Nazari, “Cardiovascular disease diagnosis: a holistic approach using the integration of machine learning and deep learning models,” Eur. J. Med. Res., vol. 29, no. 1, p. 455, Sep. 2024, doi: 10.1186/s40001-024-02044-7.

D. Kosaraju, “Machine Learning and the Future of Preventative Cardiology: A Look at Early Detection Techniques,” Galore Int. J. Heal. Sci. Res., vol. 8, no. 2, pp. 48–53, Jul. 2024, doi: 10.52403/gijhsr.20230209.

Q. Zheng, “Machine Learning Analysis in the Field of Heart Disease,” Sci. Technol. Eng. Chem. Environ. Prot., vol. 1, no. 8, Aug. 2024, doi: 10.61173/qz08vs80.

Raza Naeem, “MACHINE AND DEEP LEARNING TECHNIQUES FOR CARDIOVASCULAR DISEASE DETECTION,” J. Innov. Comput. Emerg. Technol., vol. 4, no. 2, Oct. 2024, doi: 10.56536/jicet.v4i2.131.

M. Begum and D. K. Mahabubullah, “Cardiovascular Disease Prediction Using Machine Learning,” Indian J. Comput. Sci. Technol., pp. 360–364, Aug. 2025, doi: 10.59256/indjcst.20250402049.

R. Regen and H. Setiawan, “Advancing Cardiovascular Risk Prediction: A Review of Machine Learning Models and Their Clinical Potential,” J. Electr. Technol. UMY, vol. 8, no. 2, pp. 51–59, Apr. 2025, doi: 10.18196/jet.v8i2.25208.

A. Mahabub, M. I. Mahmud, and F. Hossain, “A robust system for message filtering using an ensemble machine learning supervised approach,” ICIC Express Lett. Part B Appl., vol. 10, no. 9, pp. 805–811, 2019, doi: 10.24507/icicelb.10.09.805.

A. Tuppad and S. D. Patil, “Data Pre-processing Issues in Medical Data Classification,” 2023 Int. Conf. …, 2023, [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10275855/.

G. Ketepalli and P. Bulla, “Data Preparation and Pre-processing of Intrusion Detection Datasets using Machine Learning,” 2023 Int. Conf. …, 2023, [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10134025/.

D. Qi, “Improving Unbalanced Security X-Ray Image Classification Using VGG16 and AlexNet with Z-Score Normalization and Augmentation,” Lecture Notes in Electrical Engineering, vol. 1182. pp. 205–217, 2024, doi: 10.1007/978-981-97-1463-6_14.

D. Geem, “Progression of Pediatric Crohn’s Disease Is Associated With Anti–Tumor Necrosis Factor Timing and Body Mass Index Z-Score Normalization,” Clin. Gastroenterol. Hepatol., vol. 22, no. 2, pp. 368–376, 2024, doi: 10.1016/j.cgh.2023.08.042.

M. Sholeh, “Comparison of Z-score, min-max, and no normalization methods using support vector machine algorithm to predict student’s timely graduation,” AIP Conference Proceedings, vol. 3077, no. 1. 2024, doi: 10.1063/5.0202505.

S. Balaji, “Enhancing Diabetic Retinopathy Image Classification using CNN, Resnet, and Googlenet Models with Z-Score Normalization and GLCM Feature Extraction,” Proceedings of the 2nd International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics, ICIITCEE 2024. 2024, doi: 10.1109/IITCEE59897.2024.10467709.

Akinyemi Moruff Oyelakin and Jimoh Rasheed G, “A Survey of Feature Extraction and Feature Selection Techniques used in Machine Learning-Based Botnet Detection Schemes.”

G. V. Titaley, N. Rismayanti, A. N. Handayani, and J. T. Ardiansah, “Performance Comparison of Ensemble Learning Models for Brain Tumor Detection on Augmented MRI Datasets,” Ilk. J. Ilm., vol. 17, no. 2, pp. 86–97, Aug. 2025, doi: 10.33096/ilkom.v17i2.2523.86-97.

H. Azis, M. Abdullah, S. Ismail, and ..., “A Comparative Study of YOLO Models for Enhanced Vehicle Detection in Complex Aerial Scenarios,” 2025 19th Int. …, 2025, [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10857527/.

Purnawansyah, A. P. Wibawa, and ..., “An in-depth exploration of supervised and semi-supervised learning on face recognition,” Open Computer …. degruyterbrill.com, 2025, doi: 10.1515/comp-2025-0029.

S. Bharathidason and C. J. Venkataeswaran, “Improving Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees,” 2014.

Downloads

Published

2025-11-29