Predictive Modelling of Liver Disease Using Biochemical Markers and K-Nearest Neighbors Algorithm
Abstract
The incidence of liver cirrhosis-related deaths is on the rise due to increased alcohol consumption, chronic hepatitis infections, and obesity-related liver conditions. Early detection is critical for improving patient outcomes; however, female patients often experience delayed diagnosis. This study aims to develop a predictive model for liver disease using biochemical markers and to investigate gender disparities in diagnostic accuracy. A dataset of 584 patient records from NorthEast Andhra Pradesh, India, was utilized, comprising ten variables per patient, including age, gender, total bilirubin, direct bilirubin, alkaline phosphatase, SGPT, SGOT, total proteins, albumin, and the albumin/globulin ratio. The data were pre-processed by encoding categorical variables and scaling numerical features. The K-Nearest Neighbors (K-NN) algorithm was employed for classification, and performance was evaluated using cross-validation. The model demonstrated variable accuracy across different folds, with accuracy ranging from 57.76% to 73.28%, precision from 58.14% to 70.56%, recall from 57.76% to 73.28%, and F1-score from 57.95% to 70.45%. These results indicate the potential of biochemical markers in predicting liver disease and highlight significant gender disparities in diagnostic accuracy. The study's contributions include the development of a practical predictive tool and the identification of gender-specific diagnostic challenges. Future research should focus on larger, more diverse datasets and explore additional machine learning algorithms to enhance predictive accuracy and address gender disparities in liver disease diagnosis.
References
R. A. Azdy, R. F. Syam, E. Faizal, and ..., “Performance Evaluation of Bagging Meta-Estimator in Lung Disease Detection: A Case Study on Imbalanced Dataset,” Int. J. …, 2023.
N. Rismayanti, A. Naswin, U. Zaky, M. Zakariyah, and D. A. Purnamasari, “Evaluating Thresholding-Based Segmentation and Humoment Feature Extraction in Acute Lymphoblastic Leukemia Classification using Gaussian Naive Bayes,” Int. J. Artif. Intell. Med. Issues, vol. 1, no. 2, 2023.
R. Setiawan, A. Parewe, A. J. Latipah, and ..., “Assessing Bagging-meta Estimator in Imbalanced CT Kidney Disease Classification: A Focus on Sobel and Hu Moment Techniques,” … Artif. Intell. …, 2023.
A. Nurul, Y. Salim, and H. Azis, “Analisis performa metode Gaussian Naïve Bayes untuk klasifikasi citra tulisan tangan karakter arab,” Indones. J. Data Sci., vol. 3, no. 3, pp. 115–121, 2022, doi: https://doi.org/10.56705/ijodas.v3i3.54.
H. Azis, F. T. Admojo, and E. Susanti, “Analisis Perbandingan Performa Metode Klasifikasi pada Dataset Multiclass Citra Busur Panah,” Techno.Com, vol. 19, no. 3, 2020.
M. M. Baharuddin, T. Hasanuddin, and H. Azis, “Analisis Performa Metode K-Nearest Neighbor untuk Identifikasi Jenis Kaca,” Ilk. J. Ilm., vol. 11, no. 28, pp. 269–274, 2019.
A. Sinra, B. S. W. Poetro, H. Angriani, H. Zein, and ..., “Optimizing Neurodegenerative Disease Classification with Canny Segmentation and Voting Classifier: An Imbalanced Dataset Study,” … Artif. Intell. …, 2023.
A. Naswin and A. P. Wibowo, “Performance Analysis of the Decision Tree Classification Algorithm on the Pneumonia Dataset,” … Artif. Intell. Med. …, 2023.
S. Khomsah and E. Faizal, “Effectiveness Evaluation of the RandomForest Algorithm in Classifying CancerLips Data,” … Artif. Intell. Med. …, 2023.
N. D. Mu’azu, “K-nearest neighbor based computational intelligence and RSM predictive models for extraction of Cadmium from contaminated soil,” Ain Shams Eng. J., vol. 14, no. 4, 2023, doi: 10.1016/j.asej.2022.101944.
M. Novitasari, “Classification of House Buildings Based on Land Size Using the K-Nearest Neighbor Algorithm,” AIP Conference Proceedings, vol. 2499. 2022, doi: 10.1063/5.0104960.
L. Gao, “Enhanced chiller faults detection and isolation method based on independent component analysis and k-nearest neighbors classifier,” Build. Environ., vol. 216, 2022, doi: 10.1016/j.buildenv.2022.109010.
D. Lu, “Effective detection of Alzheimer’s disease by optimizing fuzzy K-nearest neighbors based on salp swarm algorithm,” Comput. Biol. Med., vol. 159, 2023, doi: 10.1016/j.compbiomed.2023.106930.
F. T. Admojo and N. Rismayanti, “Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions,” Indones. J. Data …, 2024.
A. P. Wibowo, M. Taruk, T. E. Tarigan, and ..., “Improving Mental Health Diagnostics through Advanced Algorithmic Models: A Case Study of Bipolar and Depressive Disorders,” Indones. J. …, 2024.
I. Alwiah, U. Zaky, and A. W. Murdiyanto, “Assessing the Predictive Power of Logistic Regression on Liver Disease Prevalence in the Indian Context,” … J. Data Sci., 2024.
C. Feng, “An Enhanced Quantum K-Nearest Neighbor Classification Algorithm Based on Polar Distance,” Entropy, vol. 25, no. 1, 2023, doi: 10.3390/e25010127.
A. K. Gupta, “A machine learning model for multi-class classification of quenched and partitioned steel microstructure type by the k-nearest neighbor algorithm,” Comput. Mater. Sci., vol. 228, 2023, doi: 10.1016/j.commatsci.2023.112321.
D. C. E. Saputra, “K-Nearest Neighbor of Beta Signal Brainwave to Accelerate Detection of Concentration on Student Learning Outcomes,” Eng. Lett., vol. 30, no. 1, pp. 318–324, 2022.
E. Alcaras, “Machine Learning Approaches for Coastline Extraction from Sentinel-2 Images: K-Means and K-Nearest Neighbour Algorithms in Comparison,” Communications in Computer and Information Science, vol. 1651. pp. 368–379, 2022, doi: 10.1007/978-3-031-17439-1_27.
G. Giri, I. A. Musdar, H. Angriani, and ..., “Enhancing Disease Management in Mango Cultivation: A Machine Learning Approach to Classifying Leaf Diseases,” Indones. J. …, 2023.
C. D. Suhendra, E. Najwaini, E. Maria, and ..., “A Machine Learning Perspective on Daisy and Dandelion Classification: Gaussian Naive Bayes with Sobel,” Indones. J. …, 2023.
R. F. Syam, “Performance Comparison Analysis of Classifiers on Binary Classification Dataset,” Indones. J. Data Sci., 2023.