Automated Classification of COVID-19 Chest X-ray Images Using Ensemble Machine Learning Methods
Abstract
This study delves into the efficacy of ensemble machine learning techniques for classifying chest X-ray images into three distinct categories: Normal, COVID-19, and Lung Opacity. Employing the Random Forest Classifier and a rigorous k-5 cross-validation framework, we aimed to enhance diagnostic accuracy for one of the most urgent medical challenges today—rapid and reliable COVID-19 detection. The analysis revealed an average accuracy of 51%, with varying precision and recall across different folds. The F1-score remained consistently around 35%, indicating a need for improved balance between precision and recall. Visualizations such as performance metric trends and a confusion matrix provided further insight into the classifier's performance, highlighting a notable degree of misclassification. Despite moderate success in the automated classification of the images, our research illustrates the complexity of applying machine learning to medical imaging, especially in differentiating between diseases with overlapping radiographic features. The study’s findings emphasize the potential of machine learning models to support diagnostic processes and suggest the necessity of advanced pre-processing techniques and extended datasets for enhanced model training. The research contributes to the growing body of knowledge in computational diagnostics and underscores the importance of developing robust, accurate machine learning tools to aid in the global healthcare crisis precipitated by the pandemic.
Downloads
References
Y. Zhao, “Classification of Zambian grasslands using random forest feature importance selection during the optimal phenological period,” Ecol. Indic., vol. 135, 2022, doi: 10.1016/j.ecolind.2021.108529.
O. S. Djandja, “Random forest-based modeling for insights on phosphorus content in hydrochar produced from hydrothermal carbonization of sewage sludge,” Energy, vol. 245, 2022, doi: 10.1016/j.energy.2022.123295.
R. A. Disha, “Performance analysis of machine learning models for intrusion detection system using Gini Impurity-based Weighted Random Forest (GIWRF) feature selection technique,” Cybersecurity, vol. 5, no. 1, 2022, doi: 10.1186/s42400-021-00103-8.
M. Salem, “Random Forest modelling and evaluation of the performance of a full-scale subsurface constructed wetland plant in Egypt,” Ain Shams Eng. J., vol. 13, no. 6, 2022, doi: 10.1016/j.asej.2022.101778.
Y. Jusman, “Classification System of Malaria Disease with Hu Moment Invariant and Support Vector Machines,” Proc. - 2022 2nd Int. Conf. Electron. Electr. Eng. Intell. Syst. ICE3IS 2022, pp. 365–368, 2022, doi: 10.1109/ICE3IS56585.2022.10010304.
Y. Jusman, “Machine Learnings of Dental Caries Images based on Hu Moment Invariants Features,” Proc. - 2021 Int. Semin. Appl. Technol. Inf. Commun. IT Oppor. Creat. Digit. Innov. Commun. within Glob. Pandemic, iSemantic 2021, pp. 296–299, 2021, doi: 10.1109/iSemantic52711.2021.9573208.
B. P. Sari, “Classification System for Cervical Cell Images based on Hu Moment Invariants Methods and Support Vector Machine,” 2021 Int. Conf. Intell. Technol. CONIT 2021, 2021, doi: 10.1109/CONIT51480.2021.9498353.
H. Azis, R. D. Mallongi, D. Lantara, and Y. Salim, “Comparison of Floyd-Warshall Algorithm and Greedy Algorithm in Determining the Shortest Route,” Proc. - 2nd East Indones. Conf. Comput. Inf. Technol. Internet Things Ind. EIConCIT 2018, pp. 294–298, 2018, doi: 10.1109/EIConCIT.2018.8878582.
D. Anggreani, I. A. E. Zaeni, A. N. Handayani, H. Azis, and A. R. Manga’, “Multivariate Data Model Prediction Analysis Using Backpropagation Neural Network Method,” in 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), 2021, pp. 239–243, doi: 10.1109/EIConCIT50028.2021.9431879.
A. Hasnain, “Assessing the ambient air quality patterns associated to the COVID-19 outbreak in the Yangtze River Delta: A random forest approach,” Chemosphere, vol. 314, 2023, doi: 10.1016/j.chemosphere.2022.137638.
X. Yu, “Random forest algorithm-based classification model of pesticide aquatic toxicity to fishes,” Aquat. Toxicol., vol. 251, 2022, doi: 10.1016/j.aquatox.2022.106265.
Y. Xin, “Predicting depression among rural and urban disabled elderly in China using a random forest classifier,” BMC Psychiatry, vol. 22, no. 1, 2022, doi: 10.1186/s12888-022-03742-4.
A. Kumar, “Multilevel thresholding for crop image segmentation based on recursive minimum cross entropy using a swarm-based technique,” Comput. Electron. Agric., vol. 203, 2022, doi: 10.1016/j.compag.2022.107488.
Y. Jusman, “Classification System for Leukemia Cell Images based on Hu Moment Invariants and Support Vector Machines,” Proc. - 2021 11th IEEE Int. Conf. Control Syst. Comput. Eng. ICCSCE 2021, pp. 137–141, 2021, doi: 10.1109/ICCSCE52189.2021.9530974.
L. Abualigah, “Multilevel thresholding image segmentation using meta-heuristic optimization algorithms: comparative analysis, open challenges and new trends,” Appl. Intell., vol. 53, no. 10, pp. 11654–11704, 2023, doi: 10.1007/s10489-022-04064-4.
E. Turajlic, “Multilevel image thresholding based on Rao algorithms and Kapur’s Entropy,” 2022 28th International Conference on Information, Communication and Automation Technologies, ICAT 2022 - Proceedings. 2022, doi: 10.1109/ICAT54566.2022.9811171.
T. Wu, “Image Segmentation via Fischer-Burmeister Total Variation and Thresholding,” Adv. Appl. Math. Mech., vol. 14, no. 4, pp. 960–988, 2022, doi: 10.4208/AAMM.OA-2021-0126.
N. Rismayanti, A. Naswin, U. Zaky, M. Zakariyah, and D. A. Purnamasari, “Evaluating Thresholding-Based Segmentation and Humoment Feature Extraction in Acute Lymphoblastic Leukemia Classification using Gaussian Naive Bayes,” Int. J. Artif. Intell. Med. Issues, vol. 1, no. 2, 2023, doi: 10.56705/ijaimi.v1i2.99.
U. Zaky, A. Naswin, S. Sumiyatun, and ..., “Performance Analysis of the Decision Tree Classification Algorithm on the Water Quality and Potability Dataset,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i3.113.
S. Hidayat, H. M. T. Ramadhan, and ..., “Comparison of K-Nearest Neighbor and Decision Tree Methods using Principal Component Analysis Technique in Heart Disease Classification,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i2.70.
H. A. Siregar, M. Z. Raditya, A. N. Yesa, and ..., “Comparison of Classification Algorithm Performance for Diabetes Prediction Using Orange Data Mining,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i3.103.
H. Oumarou and N. Rismayanti, “Automated Classification of Empon Plants: A Comparative Study Using Hu Moments and K-NN Algorithm,” Indones. J. Data …, 2023, doi: 10.56705/ijodas.v4i3.115.
T. E. Tarigan, E. Susanti, M. I. Siami, I. Arfiani, and ..., “Performance Metrics of AdaBoost and Random Forest in Multi-Class Eye Disease Identification: An Imbalanced Dataset Approach,” … Artif. Intell. …, 2023, doi: 10.56705/ijaimi.v1i2.98.
S. Khomsah and E. Faizal, “Effectiveness Evaluation of the RandomForest Algorithm in Classifying CancerLips Data,” … Artif. Intell. Med. …, 2023, doi: 10.56705/ijaimi.v1i1.84.
P. Nagaraj, “Ensemble Machine Learning (Grid Search Random Forest) based Enhanced Medical Expert Recommendation System for Diabetes Mellitus Prediction,” 3rd International Conference on Electronics and Sustainable Communication Systems, ICESC 2022 - Proceedings. pp. 757–765, 2022, doi: 10.1109/ICESC54411.2022.9885312.
Y. Gu, “Predicting intersection crash frequency using connected vehicle data: A framework for geographical random forest,” Accid. Anal. Prev., vol. 179, 2023, doi: 10.1016/j.aap.2022.106880.
M. Mafarja, “Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning,” Appl. Intell., vol. 53, no. 15, pp. 18715–18757, 2023, doi: 10.1007/s10489-022-04427-x.
A. Balaram, “Prediction of software fault-prone classes using ensemble random forest with adaptive synthetic sampling algorithm,” Autom. Softw. Eng., vol. 29, no. 1, 2022, doi: 10.1007/s10515-021-00311-z.
D. Kim, “Classification of surface settlement levels induced by TBM driving in urban areas using random forest with data-driven feature selection,” Autom. Constr., vol. 135, 2022, doi: 10.1016/j.autcon.2021.104109.
M. Khushi, “A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data,” IEEE Access, vol. 9, pp. 109960–109975, 2021, doi: 10.1109/ACCESS.2021.3102399.
S. Rahman, “Performance analysis of boosting classifiers in recognizing activities of daily living,” Int. J. Environ. Res. Public Health, vol. 17, no. 3, 2020, doi: 10.3390/ijerph17031082.
P. Sharma, “Performance analysis of deep learning CNN models for disease detection in plants using image segmentation,” Inf. Process. Agric., vol. 7, no. 4, pp. 566–574, 2020, doi: 10.1016/j.inpa.2019.11.001.

Copyright (c) 2024 Indonesian Journal of Data and Science

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
License and Copyright Agreement
In submitting the manuscript to the journal, the authors certify that:
- They are authorized by their co-authors to enter into these arrangements.
- The work described has not been formally published before, except in the form of an abstract or as part of a published lecture, review, thesis, or overlay journal.
- The work is not under consideration for publication elsewhere.
- The work has been approved by all the author(s) and by the responsible authorities – tacitly or explicitly – of the institutes where the work has been carried out.
- They secure the right to reproduce any material that has already been published or copyrighted elsewhere.
- They agree to the following license and copyright agreement.
Copyright
Authors who publish with Indonesian Journal of Data and Science agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. (CC BY-NC 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.