Classification of Skin Diseases using Decision Tree Algorithm on an Imbalanced Dataset
Abstract
Skin infections caused by pathogens such as bacteria and fungi are common and can lead to serious health complications if not properly managed. Accurate classification of these infections is crucial for effective treatment and management. This study focuses on classifying two skin diseases, Chickenpox and Shingles, using a Decision Tree algorithm applied to an imbalanced dataset sourced from Kaggle. The dataset, which is imbalanced by nature, was split into training (80%) and testing (20%) subsets. Pre-processing involved segmentation using Thresholding to isolate regions of interest and feature extraction using Hu Moments to capture shape characteristics of the lesions. The dataset was scaled to ensure that all features had a mean of 0 and variance of 1. The classifier's performance was evaluated using 5-fold cross-validation, yielding a mean accuracy of 66.06%, with precision, recall, and F1-scores indicating moderate performance. The study highlights the challenges posed by imbalanced datasets and the limitations of the Decision Tree algorithm in this context. The results underscore the importance of proper pre-processing and feature extraction but also suggest the need for more advanced classification techniques and data balancing methods. This research contributes to the field by providing a detailed methodology and comprehensive evaluation metrics, offering insights into the application of machine learning for medical image classification. Future work should focus on improving classifier performance through data augmentation, advanced feature extraction, and exploring other machine learning models better suited for imbalanced datasets.
References
C. R. Dhivyaa, “Skin lesion classification using decision trees and random forest algorithms,” J. Ambient Intell. Humaniz. Comput., 2020, doi: 10.1007/s12652-020-02675-8.
M. A. Febriantono, “Classification of multiclass imbalanced data using cost-sensitive decision tree c5.0,” IAES Int. J. Artif. Intell., vol. 9, no. 1, pp. 65–72, 2020, doi: 10.11591/ijai.v9.i1.pp65-72.
M. M. Ghiasi, “Decision tree-based diagnosis of coronary artery disease: CART model,” Comput. Methods Programs Biomed., vol. 192, 2020, doi: 10.1016/j.cmpb.2020.105400.
D. Jalal, “Decision Tree and Support Vector Machine for Anomaly Detection in Water Distribution Networks,” 2020 International Wireless Communications and Mobile Computing, IWCMC 2020. pp. 1320–1323, 2020, doi: 10.1109/IWCMC48107.2020.9148431.
O. J. Alajas, “Prediction of Grape Leaf Black Rot Damaged Surface Percentage Using Hybrid Linear Discriminant Analysis and Decision Tree,” 2021 International Conference on Intelligent Technologies, CONIT 2021. 2021, doi: 10.1109/CONIT51480.2021.9498518.
T. E. Tarigan, E. Susanti, M. I. Siami, I. Arfiani, and ..., “Performance Metrics of AdaBoost and Random Forest in Multi-Class Eye Disease Identification: An Imbalanced Dataset Approach,” … Artif. Intell. …, 2023.
R. A. Azdy, R. F. Syam, E. Faizal, and ..., “Performance Evaluation of Bagging Meta-Estimator in Lung Disease Detection: A Case Study on Imbalanced Dataset,” Int. J. …, 2023.
A. Naswin and A. P. Wibowo, “Performance Analysis of the Decision Tree Classification Algorithm on the Pneumonia Dataset,” … Artif. Intell. Med. …, 2023.
R. Setiawan, A. Parewe, A. J. Latipah, and ..., “Assessing Bagging-meta Estimator in Imbalanced CT Kidney Disease Classification: A Focus on Sobel and Hu Moment Techniques,” … Artif. Intell. …, 2023.
N. Litha and T. Hasanuddin, “Analisis Performa Metode Moving Average Model untuk Prediksi Jumlah Penderita Covid-19,” Indones. J. Data Sci., vol. 1, no. 3, pp. 87–95, 2020, doi: https://doi.org/10.33096/ijodas.v1i3.19.
H. Azis, D. Widyawati, and ..., “Prediksi potensi donatur menggunakan model Logistic Regression,” Indones. J. …, 2023.
D. Lee, “Threshold-based quantification of fatty degeneration in the supraspinatus muscle on MRI as an alternative method to Goutallier classification and single-voxel MR spectroscopy,” BMC Musculoskelet. Disord., vol. 21, no. 1, 2020, doi: 10.1186/s12891-020-03400-4.
J. Amin, “Diagnosis of COVID-19 infection using three-dimensional semantic segmentation and classification of computed tomography images,” Comput. Mater. Contin., vol. 68, no. 2, pp. 2451–2467, 2021, doi: 10.32604/cmc.2021.014199.
F. Ramlie, “Classification performance of thresholding methods in the Mahalanobis–Taguchi system,” Appl. Sci., vol. 11, no. 9, 2021, doi: 10.3390/app11093906.
Y. Liu, “Automatic multi-label ecg classification with category imbalance and cost-sensitive thresholding,” Biosensors, vol. 11, no. 11, 2021, doi: 10.3390/bios11110453.
N. Rismayanti, A. Naswin, U. Zaky, M. Zakariyah, and D. A. Purnamasari, “Evaluating Thresholding-Based Segmentation and Humoment Feature Extraction in Acute Lymphoblastic Leukemia Classification using Gaussian Naive Bayes,” Int. J. Artif. Intell. Med. Issues, vol. 1, no. 2, 2023.
B. P. Sari, “Classification System for Cervical Cell Images based on Hu Moment Invariants Methods and Support Vector Machine,” 2021 Int. Conf. Intell. Technol. CONIT 2021, 2021, doi: 10.1109/CONIT51480.2021.9498353.
Y. Jusman, “Classification System of Malaria Disease with Hu Moment Invariant and Support Vector Machines,” Proc. - 2022 2nd Int. Conf. Electron. Electr. Eng. Intell. Syst. ICE3IS 2022, pp. 365–368, 2022, doi: 10.1109/ICE3IS56585.2022.10010304.
Y. Jusman, “Classification System for Leukemia Cell Images based on Hu Moment Invariants and Support Vector Machines,” Proc. - 2021 11th IEEE Int. Conf. Control Syst. Comput. Eng. ICCSCE 2021, pp. 137–141, 2021, doi: 10.1109/ICCSCE52189.2021.9530974.
R. Hazra, “Machine Learning for Breast Cancer Classification with ANN and Decision Tree,” 11th Annual IEEE Information Technology, Electronics and Mobile Communication Conference, IEMCON 2020. pp. 522–527, 2020, doi: 10.1109/IEMCON51383.2020.9284936.
S. H. Asman, “Decision tree method for fault causes classification based on rms-dwt analysis in 275 kv transmission lines network,” Appl. Sci., vol. 11, no. 9, 2021, doi: 10.3390/app11094031.
H. Azis and S. R. Jabir, “Chemical Composition and Aroma Profiling: Decision Tree Modeling of Formalin Tofu,” J. Embed. Syst. Secur. …, 2023.