Ensemble Learning Using KNN and Decision Tree for Virus Infection Classification in Mouse Study Dataset

Aris Wahyu Murdiyanto; Thomas Edyson Tarigan; Hamada Zein

doi:10.56705/ijaimi.v3i1.359

Authors

Aris Wahyu Murdiyanto Universitas Jenderal Achmad Yani Yogyakarta https://orcid.org/0000-0002-9829-753X
Thomas Edyson Tarigan Universitas Teknologi Digital Indonesia
Hamada Zein Universitas Muhammadiyah Kalimantan Timur

DOI:

https://doi.org/10.56705/ijaimi.v3i1.359

Keywords:

Biomedical Data Analysis, Decision Tree, Ensemble Learning, K-Nearest Neighbor, Viral Infection Classification

Abstract

In this study, we propose an ensemble learning approach to classify viral infection presence in mice using the Mouse Viral Infection Study Dataset. The dataset includes two numerical features—volumes of two administered medications—and a binary label indicating viral presence. To improve prediction performance, we combined K-Nearest Neighbor (KNN) and Decision Tree (DT) classifiers within a soft voting ensemble framework. Standardization was applied as a preprocessing step to ensure fair feature contribution, especially for the distance-sensitive KNN. The ensemble model underwent hyperparameter optimization using GridSearchCV with 5-fold cross-validation to fine-tune the number of neighbors for KNN and depth-related parameters for DT. The experimental results demonstrated that the ensemble classifier achieved perfect performance, with 100% accuracy, precision, recall, and F1-score on the test set. The confusion matrix showed no misclassifications, and the Receiver Operating Characteristic (ROC) curve achieved an Area Under Curve (AUC) of 1.00, indicating excellent separability between classes. These results suggest that the proposed ensemble effectively leverages the strengths of both KNN and DT, making it suitable for biomedical classification tasks where interpretability and reliability are critical. Although the model performed exceptionally well, the simplicity of the dataset, including balanced classes and clear feature boundaries, may have contributed to the ideal performance. Thus, while the findings are promising, further validation is necessary using more complex or noisy datasets. This study contributes a practical, interpretable, and effective ensemble learning framework for binary classification problems in experimental virology, and opens pathways for further research in preclinical biomedical data analytics using hybrid classification systems.

References

[1] S. Joshi, “The ensemble method for unsupervised learning,” Ensemble Machine Learning: Advances in Research and Applications. pp. 43–67, 2024.
[2] S. R. Syed, “An ensemble learning based-detection model for chronic kidney disease,” Ensemble Machine Learning: Advances in Research and Applications. pp. 99–117, 2024.
[3] R. Moni, M. Zahid Hasan, M. Shahriar Shakil, M. J. Ferdous, M. S. Arefin, and T. Bhuiyan, “An Ensemble-Based Machine Learning Approach to Identify SARS-CoV-2 Virus Infection by Analyzing S Protein Sequences,” 2024, pp. 441–453.
[4] M. J. Hossen, T. T. Ramanathan, and A. Al Mamun, “An Ensemble Feature Selection Approach-Based Machine Learning Classifiers for Prediction of COVID-19 Disease,” Int. J. Telemed. Appl., vol. 2024, pp. 1–10, Apr. 2024, doi: 10.1155/2024/8188904.
[5] Z. R. Rise and M. M. Ershadi, “Application of Ensemble Learning in CXR Classification for Enhancing COVID-19 Diagnosis,” Qeios, Apr. 2024, doi: 10.32388/1NMNYE.
[6] F. I. Khalid, M. Makhtar, R. Rosly, W. M. A. F. B. W. Hamzah, and A. A. B. E.-E. Sambas, “Deep Neural Ensemble Classification for COVID-19 Dataset,” Nanotechnol. Perceptions, vol. 20, no. S14, Nov. 2024, doi: 10.62441/nano-ntp.v20iS14.37.
[7] R. Rohan, “Classification of cardiac arrhythmia diseases from obstructive sleep apnea signals using decision tree classifier,” Int. J. Comput. Inf. Syst. Ind. Manag. Appl., vol. 12, pp. 248–264, 2020.
[8] X. Hu, “K-Nearest Neighbor Estimation of Functional Nonparametric Regression Model under NA Samples,” Axioms, vol. 11, no. 3, 2022, doi: 10.3390/axioms11030102.
[9] M. Sholeh, “Comparison of Z-score, min-max, and no normalization methods using support vector machine algorithm to predict student’s timely graduation,” AIP Conference Proceedings, vol. 3077, no. 1. 2024, doi: 10.1063/5.0202505.
[10] S. Balaji, “Enhancing Diabetic Retinopathy Image Classification using CNN, Resnet, and Googlenet Models with Z-Score Normalization and GLCM Feature Extraction,” Proceedings of the 2nd International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics, ICIITCEE 2024. 2024, doi: 10.1109/IITCEE59897.2024.10467709.
[11] D. Qi, “Improving Unbalanced Security X-Ray Image Classification Using VGG16 and AlexNet with Z-Score Normalization and Augmentation,” Lecture Notes in Electrical Engineering, vol. 1182. pp. 205–217, 2024, doi: 10.1007/978-981-97-1463-6_14.
[12] H. Anwar, U. Qamar, and A. W. Muzaffar Qureshi, “Global Optimization Ensemble Model for Classification Methods,” Sci. World J., vol. 2014, pp. 1–9, 2014, doi: 10.1155/2014/313164.
[13] A. R. Manga’, A. N. Handayani, H. W. Herwanto, R. A. Asmara, Y. I. Sulistya, and K. Kasmira, “Analysis of the Ensemble Method Classifier’s Performance on Handwritten Arabic Characters Dataset,” Ilk. J. Ilm., vol. 15, no. 1, pp. 186–192, Apr. 2023, doi: 10.33096/ilkom.v15i1.1357.186-192.
[14] R. Siddalingappa, “K-nearest-neighbor algorithm to predict the survival time and classification of various stages of oral cancer: a machine learning approach,” F1000Research, vol. 11, p. 70, 2022, doi: 10.12688/f1000research.75469.2.
[15] C. Feng, “An Enhanced Quantum K-Nearest Neighbor Classification Algorithm Based on Polar Distance,” Entropy, vol. 25, no. 1, 2023, doi: 10.3390/e25010127.
[16] A. Anitha, “Disease prediction and knowledge extraction in banana crop cultivation using decision tree classifiers,” Int. J. Bus. Intell. Data Min., vol. 20, no. 1, pp. 107–120, 2022, doi: 10.1504/IJBIDM.2022.119957.
[17] D. R. Nemade, “Diabetes prediction using BPSO and decision tree classifier,” 2nd Int. Conf. Data, Eng. Appl. IDEA 2020, 2020, doi: 10.1109/IDEA49133.2020.9170744.
[18] R. S. M. L. Patibandla, “Ensemble machine learning for personalized diabetic retinopathy management,” Ensemble Machine Learning: Advances in Research and Applications. pp. 175–196, 2024.
[19] R. Begum, “Ensemble learning-based approaches for disease detection of agricultural products,” Ensemble Machine Learning: Advances in Research and Applications. pp. 235–254, 2024.
[20] R. Khatun, “Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data,” Genes (Basel)., vol. 14, no. 9, 2023, doi: 10.3390/genes14091802.
[21] D. S. Khafaga, “Voting Classifier and Metaheuristic Optimization for Network Intrusion Detection,” Comput. Mater. Contin., vol. 74, no. 2, pp. 3183–3198, 2023, doi: 10.32604/cmc.2023.033513.
[22] V. R. Nitha, “Lung Cancer Malignancy detection Using Voting Ensemble Classifier,” ICCSC 2023 - Proc. 2nd Int. Conf. Comput. Syst. Commun., 2023, doi: 10.1109/ICCSC56913.2023.10142984.
[23] M. Rafało, “Cross validation methods: Analysis based on diagnostics of thyroid cancer metastasis,” ICT Express, vol. 8, no. 2, pp. 183–188, 2022, doi: 10.1016/j.icte.2021.05.001.
[24] Y. Nie, “Deep Melanoma classification with K-Fold Cross-Validation for Process optimization,” IEEE Med. Meas. Appl. MeMeA 2020 - Conf. Proc., 2020, doi: 10.1109/MeMeA49120.2020.9137222.
[25] R. Ghawi and J. Pfeffer, “Efficient Hyperparameter Tuning with Grid Search for Text Categorization using kNN Approach with BM25 Similarity,” Open Comput. Sci., vol. 9, no. 1, pp. 160–180, Jan. 2019, doi: 10.1515/comp-2019-0011.
[26] M. H. Irfani, “Hyperparameter Tuning to Improve Object Detection Performance in Handwritten Images,” 2024 International Conference on Intelligent Cybernetics Technology and Applications, ICICyTA 2024. pp. 990–995, 2024, doi: 10.1109/ICICYTA64807.2024.10913390.
[27] H. Azis, M. Abdullah, S. Ismail, and ..., “A Comparative Study of YOLO Models for Enhanced Vehicle Detection in Complex Aerial Scenarios,” 2025 19th Int. …, 2025, doi: 10.1109/IMCOM64595.2025.10857527.
[28] Herman, “Comparative Performance of ResNet Architectures for Toraja Carving Image Classification with Data Augmentation,” J. Resti, vol. 9, no. 4, pp. 737–744, 2025, doi: 10.29207/resti.v9i4.6181

Ensemble Learning Using KNN and Decision Tree for Virus Infection Classification in Mouse Study Dataset

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

sampul

visitor

download