Machine Learning Approaches to Gastrointestinal Disease Diagnosis: An Experimental Study with Endoscopic Images

Authors

  • Nurul Rismayanti

DOI:

https://doi.org/10.56705/ijaimi.v1i1.147

Keywords:

Gastrointestinal Diseases, Endoscopic Images, Machine Learning, Random Forest, Feature Extraction

Abstract

Automatic detection of gastrointestinal (GI) diseases using endoscopic images is a critical and emerging field of research with significant implications for healthcare. This study leverages the Kvasir dataset, available on Kaggle, to develop a machine learning model for disease detection and classification. The dataset, consisting of annotated images from the GI tract, was pre-processed using Canny edge detection for segmentation and Hu Moments for feature extraction. The images were divided into training (80%) and testing (20%) sets. A Random Forest Classifier was employed to classify three specific classes: dyed lifted polyps, dyed resection margins, and esophagitis. The performance of the classifier was evaluated using metrics such as accuracy, precision, recall, and F1-score. The results showed moderate performance with accuracy ranging from 39.00% to 44.67%, precision from 40.68% to 47.34%, recall from 39.00% to 44.67%, and F1-scores from 38.09% to 45.07%. These findings indicate that while the Random Forest Classifier demonstrates potential, there is room for improvement in the model and pre-processing techniques. The study contributes to the field by providing a comprehensive evaluation of a machine learning approach for GI disease detection and highlights the need for further research using more advanced models and diverse datasets. Future research should focus on optimizing pre-processing methods, exploring convolutional neural networks, and expanding the dataset to improve classification performance and clinical applicability.

References

P. Nagaraj, “Ensemble Machine Learning (Grid Search & Random Forest) based Enhanced Medical Expert Recommendation System for Diabetes Mellitus Prediction,” 3rd International Conference on Electronics and Sustainable Communication Systems, ICESC 2022 - Proceedings. pp. 757–765, 2022, doi: 10.1109/ICESC54411.2022.9885312.

D. Kim, “Classification of surface settlement levels induced by TBM driving in urban areas using random forest with data-driven feature selection,” Autom. Constr., vol. 135, 2022, doi: 10.1016/j.autcon.2021.104109.

M. Salem, “Random Forest modelling and evaluation of the performance of a full-scale subsurface constructed wetland plant in Egypt,” Ain Shams Eng. J., vol. 13, no. 6, 2022, doi: 10.1016/j.asej.2022.101778.

M. R. Sharan, “Classification of Medicinal Leaf by Using Canny Edge Detection and SVM Classifier,” 2022 Int. Conf. Futur. Technol. INCOFT 2022, 2022, doi: 10.1109/INCOFT55651.2022.10094461.

S. T. H. Kieu, “COVID-19 Detection Using Integration of Deep Learning Classifiers and Contrast-Enhanced Canny Edge Detected X-Ray Images,” IT Prof., vol. 23, no. 4, pp. 51–56, 2021, doi: 10.1109/MITP.2021.3052205.

S. K. T. Hwa, “Tuberculosis detection using deep learning and contrast-enhanced canny edge detected X-Ray images,” IAES Int. J. Artif. Intell., vol. 9, no. 4, pp. 713–720, 2020, doi: 10.11591/ijai.v9.i4.pp713-720.

M. Khushi, “A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data,” IEEE Access, vol. 9, pp. 109960–109975, 2021, doi: 10.1109/ACCESS.2021.3102399.

D. İzci, “Comparative performance analysis of slime mould algorithm for efficient design of proportional–integral–derivative controller,” Electrica, vol. 21, no. 1, pp. 151–159, 2021, doi: 10.5152/ELECTRICA.2021.20077.

A. D. Purwanto, “Decision Tree and Random Forest Classification Algorithms for Mangrove Forest Mapping in Sembilang National Park, Indonesia,” Remote Sens., vol. 15, no. 1, 2023, doi: 10.3390/rs15010016.

F. Manzella, “The voice of COVID-19: Breath and cough recording classification with temporal decision trees and random forests,” Artif. Intell. Med., vol. 137, 2023, doi: 10.1016/j.artmed.2022.102486.

H. Azis, F. Fattah, and P. Putri, “Performa Klasifikasi K-NN dan Cross-validation pada Data Pasien Pengidap Penyakit Jantung,” Ilk. J. Ilm., vol. 12, no. 2, pp. 81–86, 2020.

M. M. Baharuddin, T. Hasanuddin, and H. Azis, “Analisis Performa Metode K-Nearest Neighbor untuk Identifikasi Jenis Kaca,” Ilk. J. Ilm., vol. 11, no. 28, pp. 269–274, 2019.

S. K. Jadwaa, “X-Ray Lung Image Classification Using a Canny Edge Detector,” J. Electr. Comput. Eng., vol. 2022, 2022, doi: 10.1155/2022/3081584.

B. P. Sari, “Classification System for Cervical Cell Images based on Hu Moment Invariants Methods and Support Vector Machine,” 2021 Int. Conf. Intell. Technol. CONIT 2021, 2021, doi: 10.1109/CONIT51480.2021.9498353.

Y. Jusman, “Classification System of Malaria Disease with Hu Moment Invariant and Support Vector Machines,” Proc. - 2022 2nd Int. Conf. Electron. Electr. Eng. Intell. Syst. ICE3IS 2022, pp. 365–368, 2022, doi: 10.1109/ICE3IS56585.2022.10010304.

Y. Jusman, “Classification System for Leukemia Cell Images based on Hu Moment Invariants and Support Vector Machines,” Proc. - 2021 11th IEEE Int. Conf. Control Syst. Comput. Eng. ICCSCE 2021, pp. 137–141, 2021, doi: 10.1109/ICCSCE52189.2021.9530974.

D. Cahyanti, A. Rahmayani, and ..., “Analisis performa metode Knn pada Dataset pasien pengidap Kanker Payudara,” Indones. J. …, 2020.

A. M. Tika, “Classification of potato leaf diseases based on texture, shape and color features using the random forest algorithm,” AIP Conference Proceedings, vol. 2714. 2023, doi: 10.1063/5.0128456.

Y. Mao, “Disease Classification Based on Eye Movement Features With Decision Tree and Random Forest,” Front. Neurosci., vol. 14, 2020, doi: 10.3389/fnins.2020.00798.

Y. Boer, “Classification of Heart Disease: Comparative Analysis using KNN, Random Forest, Gaussian Naive Bayes, XGBoost, SVM, Decision Tree, and Logistic Regression,” 2023 5th International Conference on Cybernetics and Intelligent Systems, ICORIS 2023. 2023, doi: 10.1109/ICORIS60118.2023.10352195.

A. Nurul, Y. Salim, and H. Azis, “Analisis performa metode Gaussian Naïve Bayes untuk klasifikasi citra tulisan tangan karakter arab,” Indones. J. Data Sci., vol. 3, no. 3, pp. 115–121, 2022, doi: https://doi.org/10.56705/ijodas.v3i3.54.

H. Azis, F. T. Admojo, and E. Susanti, “Analisis Perbandingan Performa Metode Klasifikasi pada Dataset Multiclass Citra Busur Panah,” Techno.Com, vol. 19, no. 3, 2020.

Published

2023-05-30