Optimizing Neurodegenerative Disease Classification with Canny Segmentation and Voting Classifier: An Imbalanced Dataset Study
Abstract
This study explores the efficacy of a Voting Classifier, combining Logistic Regression, Random Forest, and Gaussian Naive Bayes, in the classification of neurodegenerative diseases, focusing on Alzheimer's Disease (AD), Parkinson’s Disease (PD), and control groups. Utilizing a dataset pre-processed with Canny segmentation and Hu Moments feature extraction, the research aimed to address the challenges posed by imbalanced datasets in medical image classification. The classifier's performance was evaluated through a 5-fold cross-validation approach, with metrics including accuracy, precision, recall, and F1-Score. The results revealed a consistent recall rate of approximately 46% across all folds, indicating the model's effectiveness in identifying cases of neurodegenerative diseases. However, the precision and F1-Score were notably lower, averaging around 22% and 29%, respectively, underscoring the difficulties in achieving accurate classification in imbalanced datasets. The study contributes to the understanding of machine learning applications in medical diagnostics, specifically in the challenging context of neurodegenerative disease classification. It highlights the potential of using advanced image processing techniques combined with machine learning ensembles in enhancing diagnostic accuracy. However, it also draws attention to the inherent challenges in such approaches, particularly regarding precision in imbalanced datasets. Recommendations for future research include exploring data balancing techniques, alternative feature extraction methods, and different machine learning algorithms to improve the precision and overall performance. Additionally, applying the model to a broader and more diverse dataset could provide more generalizable and robust findings. This study is significant for researchers and practitioners in medical imaging and machine learning, offering insights into the complexities and potential of automated disease classification
References
M. Heisler et al., “Ensemble Deep Learning for Diabetic Retinopathy Detection Using Optical Coherence Tomography Angiography,” Transl. Vis. Sci. Technol., vol. 9, no. 2, p. 20, Apr. 2020, doi: 10.1167/tvst.9.2.20.
J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare,” IEEE Access, vol. 8, pp. 107562–107582, 2020, doi: 10.1109/ACCESS.2020.3001149.
S. Nusinovici et al., “Logistic regression was as good as machine learning for predicting major chronic diseases,” J. Clin. Epidemiol., vol. 122, pp. 56–69, Jun. 2020, doi: 10.1016/j.jclinepi.2020.03.002.
W. Zhang, C. Wu, H. Zhong, Y. Li, and L. Wang, “Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization,” Geosci. Front., vol. 12, no. 1, pp. 469–477, Jan. 2021, doi: 10.1016/j.gsf.2020.03.007.
A. Merghadi et al., “Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance,” Earth-Science Rev., vol. 207, p. 103225, Aug. 2020, doi: 10.1016/j.earscirev.2020.103225.
A. Ishaq et al., “Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques,” IEEE Access, vol. 9, pp. 39707–39716, 2021, doi: 10.1109/ACCESS.2021.3064084.
P. Palimkar, R. N. Shaw, and A. Ghosh, “Machine Learning Technique to Prognosis Diabetes Disease: Random Forest Classifier Approach,” 2022, pp. 219–244.
G. C. Baião, J. Janice, M. Galinou, and L. Klasson, “Comparative Genomics Reveals Factors Associated with Phenotypic Expression of Wolbachia,” Genome Biol. Evol., vol. 13, no. 7, Jul. 2021, doi: 10.1093/gbe/evab111.
I.-W. Pan, D. A. Harris, T. G. Luerssen, and S. K. Lam, “Comparative Effectiveness of Surgical Treatments for Pediatric Hydrocephalus,” Neurosurgery, vol. 83, no. 3, pp. 480–487, Sep. 2018, doi: 10.1093/neuros/nyx440.
S. Kumari, D. Kumar, and M. Mittal, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,” Int. J. Cogn. Comput. Eng., vol. 2, pp. 40–46, Jun. 2021, doi: 10.1016/j.ijcce.2021.01.001.
A. S. Assiri, S. Nazir, and S. A. Velastin, “Breast Tumor Classification Using an Ensemble Machine Learning Method,” J. Imaging, vol. 6, no. 6, p. 39, May 2020, doi: 10.3390/jimaging6060039.
A. R. Beeravolu, S. Azam, M. Jonkman, B. Shanmugam, K. Kannoorpatti, and A. Anwar, “Preprocessing of Breast Cancer Images to Create Datasets for Deep-CNN,” IEEE Access, vol. 9, pp. 33438–33463, 2021, doi: 10.1109/ACCESS.2021.3058773.
M. Yasir et al., “Automatic Coastline Extraction and Changes Analysis Using Remote Sensing and GIS Technology,” IEEE Access, vol. 8, pp. 180156–180170, 2020, doi: 10.1109/ACCESS.2020.3027881.
X. Lu et al., “An Outdoor Support Insulator Surface Defects Segmentation Approach via Image Adversarial Reconstruction in High-Speed Railway Traction Substation,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–19, 2022, doi: 10.1109/TIM.2022.3211558.
G. W. Wang and J. P. Zhang, “Automatic Recognition of Hub Classification Based on Machine Vision,” Appl. Mech. Mater., vol. 380–384, pp. 3694–3697, Aug. 2013, doi: 10.4028/www.scientific.net/AMM.380-384.3694.
D. Velusamy and K. Ramasamy, “Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset,” Comput. Methods Programs Biomed., vol. 198, p. 105770, Jan. 2021, doi: 10.1016/j.cmpb.2020.105770.
A. Sindy, “Pattern of Patients and Diseases During Mass Transit: The Day of Arafat Experience,” Pakistan J. Med. Sci., vol. 31, no. 5, Sep. 2015, doi: 10.12669/pjms.315.8017.
E. K. Shea and R. S. Hess, “Assessment of postprandial hyperglycemia and circadian fluctuation of glucose concentrations in diabetic dogs using a flash glucose monitoring system,” J. Vet. Intern. Med., vol. 35, no. 2, pp. 843–852, Mar. 2021, doi: 10.1111/jvim.16046.
M. Ziacchi et al., “Bipolar active fixation left ventricular lead or quadripolar passive fixation lead? An Italian multicenter experience,” J. Cardiovasc. Med., vol. 20, no. 4, pp. 192–200, Apr. 2019, doi: 10.2459/JCM.0000000000000778.
M. M. Baharuddin, T. Hasanuddin, and H. Azis, “Analisis Performa Metode K-Nearest Neighbor untuk Identifikasi Jenis Kaca,” Ilk. J. Ilm., vol. 11, no. 28, pp. 269–274, 2019, [Online]. Available: file:///Users/kbh/Library/Application Support/Mendeley Desktop/Downloaded/Baharuddin, Hasanuddin, Azis - 2019 - Analisis Performa Metode K-Nearest Neighbor untuk Identifikasi Jenis Kaca.pdf.
H. Azis, F. T. Admojo, and E. Susanti, “Analisis Perbandingan Performa Metode Klasifikasi pada Dataset Multiclass Citra Busur Panah,” Techno.Com, vol. 19, no. 3, 2020, [Online]. Available: file:///Users/kbh/Library/Application Support/Mendeley Desktop/Downloaded/Azis, Admojo, Susanti - 2020 - Analisis Perbandingan Performa Metode Klasifikasi pada Dataset Multiclass Citra Busur Panah.pdf.
A. Nurul, Y. Salim, and H. Azis, “Analisis performa metode Gaussian Naïve Bayes untuk klasifikasi citra tulisan tangan karakter arab,” Indones. J. Data Sci., vol. 3, no. 3, pp. 115–121, 2022, doi: https://doi.org/10.56705/ijodas.v3i3.54.
H. Azis, F. Fattah, and P. Putri, “Performa Klasifikasi K-NN dan Cross-validation pada Data Pasien Pengidap Penyakit Jantung,” Ilk. J. Ilm., vol. 12, no. 2, pp. 81–86, 2020, [Online]. Available: file:///Users/kbh/Downloads/507-2012-5-PB.pdf.
A. A. Karim, H. Azis, and Y. Salim, “Kinerja Metode C4.5 dalam Penyaluran Bantuan Dana Bencana 1,” Pros. Semin. Nas. Ilmu Komput. dan Teknol. Inf., vol. 3, no. 2, pp. 84–87, 2018, [Online]. Available: file:///Users/kbh/Library/Application Support/Mendeley Desktop/Downloaded/Karim, Azis, Salim - 2018 - Kinerja Metode C4.5 dalam Penyaluran Bantuan Dana Bencana 1.pdf.
