Development of a Decision Tree Classifier for Breast Cancer Diagnosis Using Fine Needle Aspirate Data

  • Agus Halid Universitas Almarisah Madani
  • I Gusti Ngurah Wikranta Arsa Institut Teknologi dan Bisnis STIKOM Bali https://orcid.org/0000-0002-9018-8419
  • Rezania Agramanisti Azdy Universitas Bina Darma
  • Agus Aan Jiwa Permana Universitas Pendidikan Ganesha

Keywords: Breast Cancer, Cross-Validation, Decision Tree, Machine Learning, Medical Diagnostics, Predictive Modelling

Abstract

Breast cancer is one of the leading causes of mortality among women globally, necessitating early and accurate detection to improve survival rates. This study leverages machine learning to develop a decision tree classifier for distinguishing between benign and malignant breast masses using the Kaggle Breast Cancer FNA dataset. The dataset underwent rigorous pre-processing, including the removal of irrelevant columns, data cleaning, label encoding, and feature scaling. The model was evaluated using 5-fold cross-validation, achieving an average accuracy of 84.0%, with a test set accuracy of 83.72%. Performance metrics such as precision, recall, and F1-score further validated the model's robustness, with an overall accuracy of 90.24% on the test set. The decision tree classifier demonstrated high interpretability, making it a practical tool for aiding clinical decision-making. While the results are promising, the study highlights opportunities for improvement, including the use of ensemble methods and larger datasets to enhance generalizability. This research contributes to the growing body of evidence supporting machine learning applications in medical diagnostics, particularly in breast cancer detection.

Downloads

Download data is not yet available.

References

M. M. Srikantamurthy, “Classification of benign and malignant subtypes of breast cancer histopathology imaging using hybrid CNN-LSTM based transfer learning,” BMC Med. Imaging, vol. 23, no. 1, 2023, doi: 10.1186/s12880-023-00964-0.

F. Nuraeni, “Performance Comparison of Support Vector Machine (SVM) and Decision Tree C.45 for Breast Cancer Classification Model,” 11th International Conference on ICT for Smart Society: Integrating Data and Artificial Intelligence for a Resilient and Sustainable Future Living, ICISS 2024 - Proceeding. 2024, doi: 10.1109/ICISS62896.2024.10751148.

J. T. Hasić, “Breast Cancer Classification Using Support Vector Machines (SVM),” Lecture Notes in Networks and Systems, vol. 644. pp. 195–205, 2023, doi: 10.1007/978-3-031-43056-5_16.

J. S. S. Adapala, “Breast Cancer Classification using SVM and KNN,” Proceedings of the 2023 2nd International Conference on Electronics and Renewable Systems, ICEARS 2023. pp. 1617–1621, 2023, doi: 10.1109/ICEARS56392.2023.10085546.

B. S. W. Poetro, E. Maria, H. Zein, and ..., “Advancements in Agricultural Automation: SVM Classifier with Hu Moments for Vegetable Identification,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.123.

L. Saiman and R. Satra, “Analisis performa metode Support Vector Machine untuk klasifikasi dataset aroma tahu berformalin,” Indones. J. Data Sci., vol. 2, no. 2, pp. 50–61, 2021, doi: 10.56705/ijodas.v2i2.28.

H. Azis, L. Syafie, F. Fattah, and ..., “Unveiling Algorithm Classification Excellence: Exploring Calendula and Coreopsis Flower Datasets with Varied Segmentation Techniques,” 2024 18th Int. …, 2024, doi: 10.1109/IMCOM60618.2024.10418246.

F. Fattah, A. M. Putri, and H. Azis, “Implementasi Metode Penetration Testing pada Layanan Keamanan Sistem Kartu Transaksi Elektronik Wahana Permainan,” Techno. Com, 2024, doi: 10.62411/tc.v23i1.9488.

M. N. Hasan, “Fetal Brain Planes Classification Using Deep Ensemble Transfer Learning from U-Net Segmented Fetal Neurosonography Images,” Int. J. Image, Graph. Signal Process., vol. 16, no. 4, pp. 74–86, 2024, doi: 10.5815/ijigsp.2024.04.06.

T. T. Fousiya, “Diabetic Retinopathy Classification Based on Segmented Retinal Vasculature of Fundus Images Using Attention U-NET,” INDICON 2022 - 2022 IEEE 19th India Council International Conference. 2022, doi: 10.1109/INDICON56171.2022.10039734.

I. A. P. Banlawe, “Decision Tree Learning Algorithm and Naïve Bayes Classifier Algorithm Comparative Classification for Mango Pulp Weevil Mating Activity,” 2021 IEEE Int. Conf. Autom. Control Intell. Syst. I2CACIS 2021 - Proc., pp. 317–322, 2021, doi: 10.1109/I2CACIS52118.2021.9495863.

A. A. Sharif, “Fault Detection and Location in DC Microgrids by Recurrent Neural Networks and Decision Tree Classifier,” 2020 10th Smart Grid Conf. SGC 2020, 2020, doi: 10.1109/SGC52076.2020.9335743.

P. S. Kumar, “Classification of skin cancer using convolutional neural network in comparison with decision tree classifier,” AIP Conf. Proc., vol. 2822, no. 1, 2023, doi: 10.1063/5.0173035.

O. Karal, “Performance comparison of different kernel functions in SVM for different k value in k-fold cross-validation,” Proc. - 2020 Innov. Intell. Syst. Appl. Conf. ASYU 2020, 2020, doi: 10.1109/ASYU50717.2020.9259880.

R. Setiawan and H. Oumarou, “Classification of Rice Grain Varieties Using Ensemble Learning and Image Analysis Techniques,” Indones. J. Data …, 2024, doi: 10.56705/ijodas.v5i1.129.

F. T. Admojo and N. Rismayanti, “Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions,” Indones. J. Data …, 2024, doi: 10.56705/ijodas.v5i1.126.

I. P. A. Pratama, E. S. J. Atmadji, and ..., “Evaluating the Performance of Voting Classifier in Multiclass Classification of Dry Bean Varieties,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.124.

U. Zaky, A. Naswin, S. Sumiyatun, and ..., “Performance Analysis of the Decision Tree Classification Algorithm on the Water Quality and Potability Dataset,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i3.113.

D. Widyawati, A. Faradibah, and ..., “Comparison Analysis of Classification Model Performance in Lung Cancer Prediction Using Decision Tree, Naive Bayes, and Support Vector Machine,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i2.76.

I. Alwiah, U. Zaky, and A. W. Murdiyanto, “Assessing the Predictive Power of Logistic Regression on Liver Disease Prevalence in the Indian Context,” … J. Data Sci., 2024, doi: 10.56705/ijodas.v5i1.121.

A. P. Wibowo, M. Taruk, T. E. Tarigan, and ..., “Improving Mental Health Diagnostics through Advanced Algorithmic Models: A Case Study of Bipolar and Depressive Disorders,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.122.

Published
2024-12-31
How to Cite
Halid, A., Wikranta Arsa, I. G. N., Azdy, R. A., & Jiwa Permana, A. A. (2024). Development of a Decision Tree Classifier for Breast Cancer Diagnosis Using Fine Needle Aspirate Data. Indonesian Journal of Data and Science, 5(3), 229-236. https://doi.org/10.56705/ijodas.v5i3.202