Predictive Modelling of Chronic Kidney Disease Using Gaussian Naive Bayes Algorithm
Abstract
Chronic Kidney Disease (CKD) is a critical global health issue, characterized by significant morbidity and mortality. Early detection is vital for effective management and improved patient outcomes. This study explores the application of the Gaussian Naive Bayes algorithm to predict CKD using a comprehensive dataset from Kaggle, comprising health information from 1,659 patients. The research involves detailed data pre-processing, including feature selection, data scaling, and an 80/20 split for training and testing. The model's performance was evaluated using 5-fold cross-validation, resulting in an average accuracy of 89.93%, precision of 88.15%, recall of 89.93%, and F1-score of 88.42%. These metrics highlight the model's robustness and reliability in identifying CKD cases. Visualizations such as correlation heatmaps, 3D PCA, and t-SNE plots were used to understand feature relationships and data distribution. The results confirm the hypothesis that Gaussian Naive Bayes can effectively predict CKD, providing a reliable tool for early diagnosis. This study contributes to the medical field by demonstrating the utility of machine learning in improving diagnostic accuracy. However, limitations such as dataset biases and the need for comparison with other algorithms are acknowledged. Future research should focus on expanding the dataset, incorporating more features, and exploring additional machine learning models to enhance predictive performance and generalizability. Practical implications suggest that integrating such models into clinical practice could significantly improve patient management and outcomes.
References
R. Setiawan, A. Parewe, A. J. Latipah, and ..., “Assessing Bagging-meta Estimator in Imbalanced CT Kidney Disease Classification: A Focus on Sobel and Hu Moment Techniques,” … Artif. Intell. …, 2023.
A. J. Meerja, “Gaussian naïve bayes based intrusion detection system,” Adv. Intell. Syst. Comput., vol. 1182, pp. 150–156, 2021, doi: 10.1007/978-3-030-49345-5_16.
Y. Boer, “Classification of Heart Disease: Comparative Analysis using KNN, Random Forest, Gaussian Naive Bayes, XGBoost, SVM, Decision Tree, and Logistic Regression,” 2023 5th International Conference on Cybernetics and Intelligent Systems, ICORIS 2023. 2023, doi: 10.1109/ICORIS60118.2023.10352195.
A. Tuppad and S. D. Patil, “Data Pre-processing Issues in Medical Data Classification,” 2023 Int. Conf. …, 2023, [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10275855/.
K. N. Myint and Y. Y. Hlaing, “Predictive Analytics System for Stock Data: methodology, data pre-processing and case studies,” 2023 IEEE Conf. Comput. …, 2023.
M. V Anand, “Gaussian Naïve Bayes Algorithm: A Reliable Technique Involved in the Assortment of the Segregation in Cancer,” Mob. Inf. Syst., vol. 2022, 2022, doi: 10.1155/2022/2436946.
S. Naiem, “Enhancing the Efficiency of Gaussian Naïve Bayes Machine Learning Classifier in the Detection of DDOS in Cloud Computing,” IEEE Access, vol. 11, pp. 124597–124608, 2023, doi: 10.1109/ACCESS.2023.3328951.
A. Nurul, Y. Salim, and H. Azis, “Analisis performa metode Gaussian Naïve Bayes untuk klasifikasi citra tulisan tangan karakter arab,” Indones. J. Data Sci., vol. 3, no. 3, pp. 115–121, 2022, doi: https://doi.org/10.56705/ijodas.v3i3.54.
N. A’ayunnisa, Y. Salim, and H. Azis, “Analisis performa metode Gaussian Naïve Bayes untuk klasifikasi citra tulisan tangan karakter arab,” … J. Data Sci., 2022.
R. A. Azdy, R. F. Syam, E. Faizal, and ..., “Performance Evaluation of Bagging Meta-Estimator in Lung Disease Detection: A Case Study on Imbalanced Dataset,” Int. J. …, 2023.
A. Naswin and A. P. Wibowo, “Performance Analysis of the Decision Tree Classification Algorithm on the Pneumonia Dataset,” … Artif. Intell. Med. …, 2023.
U. Zaky, A. Naswin, S. Sumiyatun, and ..., “Performance Analysis of the Decision Tree Classification Algorithm on the Water Quality and Potability Dataset,” Indones. J. …, 2023.
A. P. Wibowo, M. Taruk, T. E. Tarigan, and ..., “Improving Mental Health Diagnostics through Advanced Algorithmic Models: A Case Study of Bipolar and Depressive Disorders,” Indones. J. …, 2024.
F. T. Admojo and N. Rismayanti, “Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions,” Indones. J. Data …, 2024.
D. Pradana, M. Luthfi Alghifari, M. Farhan Juna, and D. Palaguna, “Klasifikasi Penyakit Jantung Menggunakan Metode Artificial Neural Network,” Indones. J. Data Sci., vol. 3, no. 2, pp. 55–60, 2022, doi: 10.56705/ijodas.v3i2.35.
F. Tangguh and S. Rahma, “Analisis performa metode Naïve Bayesh Classifier pada Electronic Nose dalam identifikasi formalin pada tahu,” Indones. J. Data Sci., vol. 4, no. 1, pp. 1–16, 2023.
I. P. A. Pratama, E. S. J. Atmadji, and ..., “Evaluating the Performance of Voting Classifier in Multiclass Classification of Dry Bean Varieties,” Indones. J. …, 2024.
I. F. Hanbal, “Classifying Wastes Using Random Forests, Gaussian Naïve Bayes, Support Vector Machine and Multilayer Perceptron,” IOP Conf. Ser. Mater. Sci. Eng., vol. 803, no. 1, 2020, doi: 10.1088/1757-899X/803/1/012017.
M. Gayathri, “Analysis of Accuracy in Anomaly Detection of Intrusion Detection System using Naïve Bayes Algorithm compared Over Gaussian model,” ECS Trans., vol. 107, no. 1, pp. 13977–13991, 2022, doi: 10.1149/10701.13977ecst.
P. Venkata, “Data mining model and Gaussian Naive Bayes based fault diagnostic analysis of modern power system networks,” Mater. Today Proc., vol. 62, pp. 7156–7161, 2022, doi: 10.1016/j.matpr.2022.03.035.
B. S. W. Poetro, E. Maria, H. Zein, and ..., “Advancements in Agricultural Automation: SVM Classifier with Hu Moments for Vegetable Identification,” Indones. J. …, 2024.
N. Rismayanti and A. P. Utami, “Improving Multi-Class Classification on 5-Celebrity-Faces Dataset using Ensemble Classification Methods,” Indones. J. Data …, 2023.
D. Ratnasari, “Comparison of Performance of Four Distance Metric Algorithms in K-Nearest Neighbor Method on Diabetes Patient Data,” Indones. J. Data Sci., 2023.