Development of a Decision Tree Classifier for Breast Cancer Diagnosis Using Fine Needle Aspirate Data
Abstract
Breast cancer is one of the leading causes of mortality among women globally, necessitating early and accurate detection to improve survival rates. This study leverages machine learning to develop a decision tree classifier for distinguishing between benign and malignant breast masses using the Kaggle Breast Cancer FNA dataset. The dataset underwent rigorous pre-processing, including the removal of irrelevant columns, data cleaning, label encoding, and feature scaling. The model was evaluated using 5-fold cross-validation, achieving an average accuracy of 84.0%, with a test set accuracy of 83.72%. Performance metrics such as precision, recall, and F1-score further validated the model's robustness, with an overall accuracy of 90.24% on the test set. The decision tree classifier demonstrated high interpretability, making it a practical tool for aiding clinical decision-making. While the results are promising, the study highlights opportunities for improvement, including the use of ensemble methods and larger datasets to enhance generalizability. This research contributes to the growing body of evidence supporting machine learning applications in medical diagnostics, particularly in breast cancer detection.
Downloads
References
M. M. Srikantamurthy, “Classification of benign and malignant subtypes of breast cancer histopathology imaging using hybrid CNN-LSTM based transfer learning,” BMC Med. Imaging, vol. 23, no. 1, 2023, doi: 10.1186/s12880-023-00964-0.
F. Nuraeni, “Performance Comparison of Support Vector Machine (SVM) and Decision Tree C.45 for Breast Cancer Classification Model,” 11th International Conference on ICT for Smart Society: Integrating Data and Artificial Intelligence for a Resilient and Sustainable Future Living, ICISS 2024 - Proceeding. 2024, doi: 10.1109/ICISS62896.2024.10751148.
J. T. Hasić, “Breast Cancer Classification Using Support Vector Machines (SVM),” Lecture Notes in Networks and Systems, vol. 644. pp. 195–205, 2023, doi: 10.1007/978-3-031-43056-5_16.
J. S. S. Adapala, “Breast Cancer Classification using SVM and KNN,” Proceedings of the 2023 2nd International Conference on Electronics and Renewable Systems, ICEARS 2023. pp. 1617–1621, 2023, doi: 10.1109/ICEARS56392.2023.10085546.
B. S. W. Poetro, E. Maria, H. Zein, and ..., “Advancements in Agricultural Automation: SVM Classifier with Hu Moments for Vegetable Identification,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.123.
L. Saiman and R. Satra, “Analisis performa metode Support Vector Machine untuk klasifikasi dataset aroma tahu berformalin,” Indones. J. Data Sci., vol. 2, no. 2, pp. 50–61, 2021, doi: 10.56705/ijodas.v2i2.28.
H. Azis, L. Syafie, F. Fattah, and ..., “Unveiling Algorithm Classification Excellence: Exploring Calendula and Coreopsis Flower Datasets with Varied Segmentation Techniques,” 2024 18th Int. …, 2024, doi: 10.1109/IMCOM60618.2024.10418246.
F. Fattah, A. M. Putri, and H. Azis, “Implementasi Metode Penetration Testing pada Layanan Keamanan Sistem Kartu Transaksi Elektronik Wahana Permainan,” Techno. Com, 2024, doi: 10.62411/tc.v23i1.9488.
M. N. Hasan, “Fetal Brain Planes Classification Using Deep Ensemble Transfer Learning from U-Net Segmented Fetal Neurosonography Images,” Int. J. Image, Graph. Signal Process., vol. 16, no. 4, pp. 74–86, 2024, doi: 10.5815/ijigsp.2024.04.06.
T. T. Fousiya, “Diabetic Retinopathy Classification Based on Segmented Retinal Vasculature of Fundus Images Using Attention U-NET,” INDICON 2022 - 2022 IEEE 19th India Council International Conference. 2022, doi: 10.1109/INDICON56171.2022.10039734.
I. A. P. Banlawe, “Decision Tree Learning Algorithm and Naïve Bayes Classifier Algorithm Comparative Classification for Mango Pulp Weevil Mating Activity,” 2021 IEEE Int. Conf. Autom. Control Intell. Syst. I2CACIS 2021 - Proc., pp. 317–322, 2021, doi: 10.1109/I2CACIS52118.2021.9495863.
A. A. Sharif, “Fault Detection and Location in DC Microgrids by Recurrent Neural Networks and Decision Tree Classifier,” 2020 10th Smart Grid Conf. SGC 2020, 2020, doi: 10.1109/SGC52076.2020.9335743.
P. S. Kumar, “Classification of skin cancer using convolutional neural network in comparison with decision tree classifier,” AIP Conf. Proc., vol. 2822, no. 1, 2023, doi: 10.1063/5.0173035.
O. Karal, “Performance comparison of different kernel functions in SVM for different k value in k-fold cross-validation,” Proc. - 2020 Innov. Intell. Syst. Appl. Conf. ASYU 2020, 2020, doi: 10.1109/ASYU50717.2020.9259880.
R. Setiawan and H. Oumarou, “Classification of Rice Grain Varieties Using Ensemble Learning and Image Analysis Techniques,” Indones. J. Data …, 2024, doi: 10.56705/ijodas.v5i1.129.
F. T. Admojo and N. Rismayanti, “Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions,” Indones. J. Data …, 2024, doi: 10.56705/ijodas.v5i1.126.
I. P. A. Pratama, E. S. J. Atmadji, and ..., “Evaluating the Performance of Voting Classifier in Multiclass Classification of Dry Bean Varieties,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.124.
U. Zaky, A. Naswin, S. Sumiyatun, and ..., “Performance Analysis of the Decision Tree Classification Algorithm on the Water Quality and Potability Dataset,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i3.113.
D. Widyawati, A. Faradibah, and ..., “Comparison Analysis of Classification Model Performance in Lung Cancer Prediction Using Decision Tree, Naive Bayes, and Support Vector Machine,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i2.76.
I. Alwiah, U. Zaky, and A. W. Murdiyanto, “Assessing the Predictive Power of Logistic Regression on Liver Disease Prevalence in the Indian Context,” … J. Data Sci., 2024, doi: 10.56705/ijodas.v5i1.121.
A. P. Wibowo, M. Taruk, T. E. Tarigan, and ..., “Improving Mental Health Diagnostics through Advanced Algorithmic Models: A Case Study of Bipolar and Depressive Disorders,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.122.

Authors retain copyright and full publishing rights to their articles. Upon acceptance, authors grant Indonesian Journal of Data and Science a non-exclusive license to publish the work and to identify itself as the original publisher.
Self-archiving. Authors may deposit the submitted version, accepted manuscript, and version of record in institutional or subject repositories, with citation to the published article and a link to the version of record on the journal website.
Commercial permissions. Uses intended for commercial advantage or monetary compensation are not permitted under CC BY-NC 4.0. For permissions, contact the editorial office at [editorial email/contact form].
Legacy notice. Some earlier PDFs may display “Copyright © [Journal Name]” or only a CC BY-NC logo without the full license text. For the avoidance of doubt, authors hold copyright, and all articles are distributed under CC BY-NC 4.0. Where any discrepancy exists, this policy and the article landing-page license statement prevail.