Machine Learning-Based Prediction of HIV/AIDS Infection and Treatment Effectiveness: A Clinical Dataset Analysis

Agus Aan Jiwa Permana; I Gusti Ngurah Wikranta Arsa; Ahmad Naswin; Sumiyatun

doi:10.56705/ijaimi.v3i2.362

Authors

Agus Aan Jiwa Permana Universitas Pendidikan Ganesha
I Gusti Ngurah Wikranta Arsa ITB STIKOM Bali
Ahmad Naswin Universitas Megarezky Makassar
Sumiyatun Universitas Teknologi Digital Indonesia

DOI:

https://doi.org/10.56705/ijaimi.v3i2.362

Keywords:

HIV/AIDS Prediction, Machine Learning, Antiretroviral Therapy, Treatment Effectiveness, CD4 Dynamics

Abstract

The early and accurate prediction of HIV/AIDS infection is critical to improving clinical decision-making and ensuring effective patient management. This study presents a comprehensive machine learning-based approach to predict HIV/AIDS infection status and evaluate the effectiveness of antiretroviral treatments using a well-documented clinical dataset from 1996, comprising 2,139 patient records and 34 features. Through rigorous preprocessing, exploratory data analysis, and feature engineering, several new clinically relevant attributes were constructed, such as CD4/CD8 ratios and immunological change metrics. Four machine learning models—Logistic Regression, Support Vector Machine, Random Forest, and Gradient Boosting—were trained and evaluated. Among these, the Gradient Boosting classifier achieved the highest ROC-AUC score of 0.9335, while Random Forest provided strong predictive performance with a ROC-AUC of 0.9180 and was selected for further evaluation due to its model transparency. Key features influencing infection prediction included CD4+ and CD8+ dynamics, baseline immunological levels, and treatment history. Additionally, the study examined treatment effectiveness by analyzing CD4+ cell count responses across different therapy types. The combination of ZDV and ddI emerged as the most effective regimen, improving immune outcomes and lowering infection rates, while ZDV monotherapy showed the least favorable results. This work underscores the potential of machine learning as a clinical decision support tool in HIV/AIDS care and provides data-driven insights into treatment optimization. Future studies should incorporate longitudinal patient data and real-world clinical environments for broader applicability.

References

S. Ngcobo et al., “Artificial intelligence for HIV care: a global systematic review of current studies and emerging trends,” J. Int. AIDS Soc., vol. 28, no. 10, Oct. 2025, doi: 10.1002/jia2.70045.

T. Xie, “Risk Prediction and Intervention Modeling in the HIV Epidemic,” Theor. Nat. Sci., vol. 133, no. 1, pp. 137–145, Aug. 2025, doi: 10.54254/2753-8818/2025.AU25795.

A. K. Abdulsahib, G. S. Hassan, F. M. Alwan, and I. I. Al_Barazanchi, “Deep Learning in Genomic Sequencing: Advanced Algorithms for HIV/AIDS Strain Prediction and Drug Resistance Analysis,” Appl. Data Sci. Anal., vol. 2025, pp. 178–186, Sep. 2025, doi: 10.58496/ADSA/2025/015.

M. Ijaiya et al., “Use of machine learning in predicting continuity of HIV treatment in selected Nigerian States,” PLOS Glob. Public Heal., vol. 5, no. 4, p. e0004497, Apr. 2025, doi: 10.1371/journal.pgph.0004497.

Q. Cai et al., “Survival prediction models for people living with HIV based on four machine learning models,” Sci. Rep., vol. 15, no. 1, p. 31256, Aug. 2025, doi: 10.1038/s41598-025-16479-3.

A. O. Babatunde et al., “Application of Artificial Intelligence for Predicting HIV Prevention: A Systematic Review and Meta-Analysis.” Aug. 06, 2025, doi: 10.21203/rs.3.rs-6999902/v1.

W. Kwarah, F. B. da-C. Vroom, D. Dwomoh, and S. Bosomprah, “Evaluating Machine Learning models for predicting HIV treatment interruption: a systematic review of accuracy, validity, and applicability.” Apr. 22, 2025, doi: 10.21203/rs.3.rs-5810875/v1.

C. Y. Chui and A. W. E. Chan, “Machine Learning Prediction of HIV1 Drug Resistance against Integrase Strand Transfer Inhibitors.” Apr. 28, 2025, doi: 10.1101/2025.04.25.650610.

Nurul Rismayanti and Aulia Putri Utami, “Improving Multi-Class Classification on 5-Celebrity-Faces Dataset using Ensemble Classification Methods,” Indones. J. Data Sci., vol. 4, no. 2, pp. 124–133, 2023, doi: 10.56705/ijodas.v4i2.78.

F. Wu, C. Lin, and R. Weng, “Probability Estimates for Multi-Class Support Vector Machines by Pairwise Coupling,” J. Mach. Learn. Res., vol. 5, pp. 975–1005, 2004.

S. Balaji, “Enhancing Diabetic Retinopathy Image Classification using CNN, Resnet, and Googlenet Models with Z-Score Normalization and GLCM Feature Extraction,” Proceedings of the 2nd International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics, ICIITCEE 2024. 2024, doi: 10.1109/IITCEE59897.2024.10467709.

D. Qi, “Improving Unbalanced Security X-Ray Image Classification Using VGG16 and AlexNet with Z-Score Normalization and Augmentation,” Lecture Notes in Electrical Engineering, vol. 1182. pp. 205–217, 2024, doi: 10.1007/978-981-97-1463-6_14.

L. Peng, “Dual-Structure Elements Morphological Filtering and Local Z-Score Normalization for Infrared Small Target Detection against Heavy Clouds,” Remote Sens., vol. 16, no. 13, 2024, doi: 10.3390/rs16132343.

M. Sholeh, “Comparison of Z-score, min-max, and no normalization methods using support vector machine algorithm to predict student’s timely graduation,” AIP Conference Proceedings, vol. 3077, no. 1. 2024, doi: 10.1063/5.0202505.

D. Geem, “Progression of Pediatric Crohn’s Disease Is Associated With Anti–Tumor Necrosis Factor Timing and Body Mass Index Z-Score Normalization,” Clin. Gastroenterol. Hepatol., vol. 22, no. 2, pp. 368–376, 2024, doi: 10.1016/j.cgh.2023.08.042.

G. V. Titaley, N. Rismayanti, A. N. Handayani, and J. T. Ardiansah, “Performance Comparison of Ensemble Learning Models for Brain Tumor Detection on Augmented MRI Datasets,” Ilk. J. Ilm., vol. 17, no. 2, pp. 86–97, Aug. 2025, doi: 10.33096/ilkom.v17i2.2523.86-97.

A. Tharwat, “Classification assessment methods,” Appl. Comput. Informatics, vol. 17, no. 1, pp. 168–192, Jan. 2021, doi: 10.1016/j.aci.2018.08.003.

D. Iqbal and A. Shahzad, “Prediction of thyroid cancer recurrence with machine learning models,” Pakistan J. Nucl. Med., pp. 49–55, 2024, doi: 10.24911/PJNMed.175-1721068107.

Y. Xin, “Predicting depression among rural and urban disabled elderly in China using a random forest classifier,” BMC Psychiatry, vol. 22, no. 1, 2022, doi: 10.1186/s12888-022-03742-4.

P. Nagaraj, “Ensemble Machine Learning (Grid Search Random Forest) based Enhanced Medical Expert Recommendation System for Diabetes Mellitus Prediction,” 3rd International Conference on Electronics and Sustainable Communication Systems, ICESC 2022 - Proceedings. pp. 757–765, 2022, doi: 10.1109/ICESC54411.2022.9885312.

H. Nhat-Duc, “Comparison of histogram-based gradient boosting classification machine, random Forest, and deep convolutional neural network for pavement raveling severity classification,” Autom. Constr., vol. 148, 2023, doi: 10.1016/j.autcon.2023.104767.

Purnawansyah, A. P. Wibawa, and ..., “An in-depth exploration of supervised and semi-supervised learning on face recognition,” Open Computer …. degruyterbrill.com, 2025, doi: 10.1515/comp-2025-0029.

A. R. Manga, M. A. F. Latief, A. W. M. Gaffar, and ..., “Hyperparameter Tuning of Identity Block Uses an Imbalance Dataset with Hyperband Method,” 2024 18th …, 2024, [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10418427/.

M. Sylvia Molle, D. L. Ramatillah, and K. U. Khan, “Evaluation of the Effectiveness of Pharmaceutical Counseling on Therapeutic Outcomes of HIV/AIDS Patients at Waihaong Community Health Center,” J. Clin. Med. Regen. Med., pp. 1–5, Oct. 2025, doi: 10.47363/JCMRM/2025(3)127.