Gender-Aware Prediction of Liver Disease Using Machine Learning and Clinical Laboratory Data

Authors

  • Umar Zaky Universitas Teknologi Yogyakarta
  • Muhammad Habibi Universitas Jenderal Achmad Yani Yogyakarta
  • Adri Priadana Universitas Jenderal Achmad Yani Yogyakarta
  • Thomas Edyson Tarigan Universitas Teknologi Digital Indonesia

DOI:

https://doi.org/10.56705/wtsdw234

Keywords:

Liver Disease Prediction, Machine Learning, Clinical Biomarkers, SMOTENC, Gender-Aware Evaluation, Explainable AI

Abstract

Liver disease is a major health problem that may progress silently and lead to severe clinical complications if not detected early. Machine learning offers a promising approach for supporting early screening by identifying predictive patterns from clinical and biochemical patient data. This study developed an explainable gender-aware machine learning framework for liver disease prediction using demographic information and clinical biomarkers. The dataset consisted of 570 patient records after duplicate removal, including age, gender, total bilirubin, direct bilirubin, alkaline phosphatase, SGPT, SGOT, total protein, albumin, albumin/globulin ratio, and liver disease status. Several machine learning algorithms were evaluated under three experimental scenarios: original data, class-weighted learning, and SMOTENC-based oversampling. Model performance was assessed using accuracy, precision, recall, specificity, F1-score, and ROC-AUC. The experimental results showed that Gradient Boosting combined with SMOTENC achieved the best F1-score, with an accuracy of 0.7632, precision of 0.7935, recall of 0.9012, specificity of 0.4242, F1-score of 0.8439, and ROC-AUC of 0.7759. The model correctly identified 73 of 81 liver disease cases in the testing set, indicating strong sensitivity for early screening. Gender-based evaluation showed comparable F1-scores for male and female patients, with values of 0.8430 and 0.8462, respectively. Feature importance analysis identified SGOT, alkaline phosphatase, age, and direct bilirubin as the most influential predictors. These findings suggest that an explainable and gender-aware machine learning approach can support liver disease risk prediction using routinely available clinical biomarkers, although further validation using larger and more balanced datasets is required

References

[1] H. Devarbhavi, S. K. Asrani, J. P. Arab, Y. A. Nartey, E. Pose, and P. S. Kamath, “Global burden of liver disease: 2023 update,” J. Hepatol., vol. 79, no. 2, pp. 516–537, Aug. 2023, doi: 10.1016/j.jhep.2023.03.017.

[2] X.-N. Wu et al., “Global burden of liver cirrhosis and other chronic liver diseases caused by specific etiologies from 1990 to 2019,” BMC Public Health, vol. 24, no. 1, p. 363, Feb. 2024, doi: 10.1186/s12889-024-17948-6.

[3] S. Xiao, W. Xie, Y. Zhang, L. Lei, and Y. Pan, “Changing epidemiology of cirrhosis from 2010 to 2019: results from the Global Burden Disease study 2019,” Ann. Med., vol. 55, no. 2, p. 2252326, Dec. 2023, doi: 10.1080/07853890.2023.2252326.

[4] S. Thakur, V. Kumar, R. Das, V. Sharma, and D. K. Mehta, “Biomarkers of Hepatic Toxicity: An Overview,” Curr. Ther. Res., vol. 100, p. 100737, 2024, doi: 10.1016/j.curtheres.2024.100737.

[5] R. A. Khan, Y. Luo, and F.-X. Wu, “Machine learning based liver disease diagnosis: A systematic review,” Neurocomputing, vol. 468, pp. 492–509, Jan. 2022, doi: 10.1016/j.neucom.2021.08.138.

[6] S. M. Ganie, P. K. Dutta Pramanik, and Z. Zhao, “Improved liver disease prediction from clinical data through an evaluation of ensemble learning approaches,” BMC Med. Inform. Decis. Mak., vol. 24, no. 1, p. 160, Jun. 2024, doi: 10.1186/s12911-024-02550-y.

[7] A. U. Rehman et al., “A Machine Learning‐Based Framework for Accurate and Early Diagnosis of Liver Diseases: A Comprehensive Study on Feature Selection, Data Imbalance, and Algorithmic Performance,” Int. J. Intell. Syst., vol. 2024, no. 1, p. 6111312, Jan. 2024, doi: 10.1155/2024/6111312.

[8] W. El Atifi, O. El Rhazouani, F. M. Khan, and H. Sekkat, “Optimizing ensemble machine learning models for accurate liver disease prediction in healthcare,” PLOS One, vol. 20, no. 8, p. e0330899, Aug. 2025, doi: 10.1371/journal.pone.0330899.

[9] A. Q. Md, S. Kulkarni, C. J. Joshua, T. Vaichole, S. Mohan, and C. Iwendi, “Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease,” Biomedicines, vol. 11, no. 2, p. 581, Feb. 2023, doi: 10.3390/biomedicines11020581.

[10] S. Dalal, E. M. Onyema, and A. Malik, “Hybrid XGBoost model with hyperparameter tuning for prediction of liver disease with better accuracy,” World J. Gastroenterol., vol. 28, no. 46, pp. 6551–6563, Dec. 2022, doi: 10.3748/wjg.v28.i46.6551.

[11] Y. Yang, H. A. Khorshidi, and U. Aickelin, “A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems,” Front. Digit. Health, vol. 6, p. 1430245, Jul. 2024, doi: 10.3389/fdgth.2024.1430245.

[12] R. Amin, R. Yasmin, S. Ruhi, M. H. Rahman, and M. S. Reza, “Prediction of chronic liver disease patients using integrated projection based statistical feature extraction with machine learning algorithms,” Inform. Med. Unlocked, vol. 36, p. 101155, 2023, doi: 10.1016/j.imu.2022.101155.

[13] S. K. Joo and W. Kim, “Sex differences in metabolic dysfunction-associated steatotic liver disease: a narrative review,” Ewha Med. J., vol. 47, no. 2, p. e17, Apr. 2024, doi: 10.12771/emj.2024.e17.

[14] I. Straw and H. Wu, “Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction,” BMJ Health Care Inform., vol. 29, no. 1, p. e100457, Apr. 2022, doi: 10.1136/bmjhci-2021-100457.

[15] Q. Abbas, W. Jeong, and S. W. Lee, “Explainable AI in Clinical Decision Support Systems: A Meta-Analysis of Methods, Applications, and Usability Challenges,” Healthcare, vol. 13, no. 17, p. 2154, Aug. 2025, doi: 10.3390/healthcare13172154.

[16] R. Rani et al., “Enhancing liver disease diagnosis with hybrid SMOTE-ENN balanced machine learning models—an empirical analysis of Indian patient liver disease datasets,” Front. Med., vol. 12, p. 1502749, May 2025, doi: 10.3389/fmed.2025.1502749.

[17] B. Njei, E. Osta, N. Njei, Y. A. Al-Ajlouni, and J. K. Lim, “An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis,” Sci. Rep., vol. 14, no. 1, p. 8589, Apr. 2024, doi: 10.1038/s41598-024-59183-4.

[18] J. Deng et al., “Development and validation of a machine learning-based framework for assessing metabolic-associated fatty liver disease risk,” BMC Public Health, vol. 24, no. 1, p. 2545, Sep. 2024, doi: 10.1186/s12889-024-19882-z.

[19] F. Masaebi et al., “Machine-Learning Application for Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease Using Laboratory and Body Composition Indicators,” Arch. Iran. Med., vol. 27, no. 10, pp. 551–562, Oct. 2024, doi: 10.34172/aim.31269.

[20] L. Zhang, Y. Huang, M. Huang, C.-H. Zhao, Y.-J. Zhang, and Y. Wang, “Development of Cost-Effective Fatty Liver Disease Prediction Models in a Chinese Population: Statistical and Machine Learning Approaches,” JMIR Form. Res., vol. 8, p. e53654, Feb. 2024, doi: 10.2196/53654.

[21] B. Yang, H. Lu, and Y. Ran, “Advancing non-alcoholic fatty liver disease prediction: a comprehensive machine learning approach integrating SHAP interpretability and multi-cohort validation,” Front. Endocrinol., vol. 15, p. 1450317, Oct. 2024, doi: 10.3389/fendo.2024.1450317.

[22] E. Richardson, R. Trevizani, J. A. Greenbaum, H. Carter, M. Nielsen, and B. Peters, “The receiver operating characteristic curve accurately assesses imbalanced datasets,” Patterns, vol. 5, no. 6, p. 100994, Jun. 2024, doi: 10.1016/j.patter.2024.100994.

[23] M. Salimparsa, K. Sedig, D. J. Lizotte, S. S. Abdullah, N. Chalabianloo, and F. T. Muanda, “Explainable AI for Clinical Decision Support Systems: Literature Review, Key Gaps, and Research Synthesis,” Informatics, vol. 12, no. 4, p. 119, Oct. 2025, doi: 10.3390/informatics12040119.

[24] M. Pons et al., “Point-of-Care Noninvasive Prediction of Liver-Related Events in Patients With Nonalcoholic Fatty Liver Disease,” Clin. Gastroenterol. Hepatol., vol. 22, no. 8, pp. 1637-1645.e9, Aug. 2024, doi: 10.1016/j.cgh.2023.08.004.

[25] V. Charu, J. W. Liang, A. Mannalithara, A. Kwong, L. Tian, and W. R. Kim, “Benchmarking clinical risk prediction algorithms with ensemble machine learning for the noninvasive diagnosis of liver fibrosis in NAFLD,” Hepatology, vol. 80, no. 5, pp. 1184–1195, Nov. 2024, doi: 10.1097/HEP.0000000000000908.

[26] Y. Yu, Y. Yang, Q. Li, J. Yuan, and Y. Zha, “Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods,” Sci. Rep., vol. 15, no. 1, p. 12382, Apr. 2025, doi: 10.1038/s41598-025-96478-6.

[27] C.-H. Lu et al., “Machine Learning Models for Predicting Significant Liver Fibrosis in Patients with Severe Obesity and Nonalcoholic Fatty Liver Disease,” Obes. Surg., vol. 34, no. 12, pp. 4393–4404, Dec. 2024, doi: 10.1007/s11695-024-07548-z.

[28] A. Talwar et al., “Sex bias consideration in healthcare machine-learning research: a systematic review in rheumatoid arthritis,” BMJ Open, vol. 15, no. 3, p. e086117, Mar. 2025, doi: 10.1136/bmjopen-2024-086117.

[29] S. Weng, D. Hu, J. Chen, Y. Yang, and D. Peng, “Prediction of Fatty Liver Disease in a Chinese Population Using Machine-Learning Algorithms,” Diagnostics, vol. 13, no. 6, p. 1168, Mar. 2023, doi: 10.3390/diagnostics13061168.

[30] N. Almusallam and S. Khan, “Chronic liver disease classification using deep learning with SHAP-optimized hybrid features,” iScience, vol. 28, no. 12, p. 113972, Dec. 2025, doi: 10.1016/j.isci.2025.113972.

Downloads

Published

2026-04-15