Development of a Decision Tree Classifier for Breast Cancer Diagnosis Using Fine Needle Aspirate Data
Abstract
Breast cancer is one of the leading causes of mortality among women globally, necessitating early and accurate detection to improve survival rates. This study leverages machine learning to develop a decision tree classifier for distinguishing between benign and malignant breast masses using the Kaggle Breast Cancer FNA dataset. The dataset underwent rigorous pre-processing, including the removal of irrelevant columns, data cleaning, label encoding, and feature scaling. The model was evaluated using 5-fold cross-validation, achieving an average accuracy of 84.0%, with a test set accuracy of 83.72%. Performance metrics such as precision, recall, and F1-score further validated the model's robustness, with an overall accuracy of 90.24% on the test set. The decision tree classifier demonstrated high interpretability, making it a practical tool for aiding clinical decision-making. While the results are promising, the study highlights opportunities for improvement, including the use of ensemble methods and larger datasets to enhance generalizability. This research contributes to the growing body of evidence supporting machine learning applications in medical diagnostics, particularly in breast cancer detection.
Downloads
References
M. M. Srikantamurthy, “Classification of benign and malignant subtypes of breast cancer histopathology imaging using hybrid CNN-LSTM based transfer learning,” BMC Med. Imaging, vol. 23, no. 1, 2023, doi: 10.1186/s12880-023-00964-0.
F. Nuraeni, “Performance Comparison of Support Vector Machine (SVM) and Decision Tree C.45 for Breast Cancer Classification Model,” 11th International Conference on ICT for Smart Society: Integrating Data and Artificial Intelligence for a Resilient and Sustainable Future Living, ICISS 2024 - Proceeding. 2024, doi: 10.1109/ICISS62896.2024.10751148.
J. T. Hasić, “Breast Cancer Classification Using Support Vector Machines (SVM),” Lecture Notes in Networks and Systems, vol. 644. pp. 195–205, 2023, doi: 10.1007/978-3-031-43056-5_16.
J. S. S. Adapala, “Breast Cancer Classification using SVM and KNN,” Proceedings of the 2023 2nd International Conference on Electronics and Renewable Systems, ICEARS 2023. pp. 1617–1621, 2023, doi: 10.1109/ICEARS56392.2023.10085546.
B. S. W. Poetro, E. Maria, H. Zein, and ..., “Advancements in Agricultural Automation: SVM Classifier with Hu Moments for Vegetable Identification,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.123.
L. Saiman and R. Satra, “Analisis performa metode Support Vector Machine untuk klasifikasi dataset aroma tahu berformalin,” Indones. J. Data Sci., vol. 2, no. 2, pp. 50–61, 2021, doi: 10.56705/ijodas.v2i2.28.
H. Azis, L. Syafie, F. Fattah, and ..., “Unveiling Algorithm Classification Excellence: Exploring Calendula and Coreopsis Flower Datasets with Varied Segmentation Techniques,” 2024 18th Int. …, 2024, doi: 10.1109/IMCOM60618.2024.10418246.
F. Fattah, A. M. Putri, and H. Azis, “Implementasi Metode Penetration Testing pada Layanan Keamanan Sistem Kartu Transaksi Elektronik Wahana Permainan,” Techno. Com, 2024, doi: 10.62411/tc.v23i1.9488.
M. N. Hasan, “Fetal Brain Planes Classification Using Deep Ensemble Transfer Learning from U-Net Segmented Fetal Neurosonography Images,” Int. J. Image, Graph. Signal Process., vol. 16, no. 4, pp. 74–86, 2024, doi: 10.5815/ijigsp.2024.04.06.
T. T. Fousiya, “Diabetic Retinopathy Classification Based on Segmented Retinal Vasculature of Fundus Images Using Attention U-NET,” INDICON 2022 - 2022 IEEE 19th India Council International Conference. 2022, doi: 10.1109/INDICON56171.2022.10039734.
I. A. P. Banlawe, “Decision Tree Learning Algorithm and Naïve Bayes Classifier Algorithm Comparative Classification for Mango Pulp Weevil Mating Activity,” 2021 IEEE Int. Conf. Autom. Control Intell. Syst. I2CACIS 2021 - Proc., pp. 317–322, 2021, doi: 10.1109/I2CACIS52118.2021.9495863.
A. A. Sharif, “Fault Detection and Location in DC Microgrids by Recurrent Neural Networks and Decision Tree Classifier,” 2020 10th Smart Grid Conf. SGC 2020, 2020, doi: 10.1109/SGC52076.2020.9335743.
P. S. Kumar, “Classification of skin cancer using convolutional neural network in comparison with decision tree classifier,” AIP Conf. Proc., vol. 2822, no. 1, 2023, doi: 10.1063/5.0173035.
O. Karal, “Performance comparison of different kernel functions in SVM for different k value in k-fold cross-validation,” Proc. - 2020 Innov. Intell. Syst. Appl. Conf. ASYU 2020, 2020, doi: 10.1109/ASYU50717.2020.9259880.
R. Setiawan and H. Oumarou, “Classification of Rice Grain Varieties Using Ensemble Learning and Image Analysis Techniques,” Indones. J. Data …, 2024, doi: 10.56705/ijodas.v5i1.129.
F. T. Admojo and N. Rismayanti, “Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions,” Indones. J. Data …, 2024, doi: 10.56705/ijodas.v5i1.126.
I. P. A. Pratama, E. S. J. Atmadji, and ..., “Evaluating the Performance of Voting Classifier in Multiclass Classification of Dry Bean Varieties,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.124.
U. Zaky, A. Naswin, S. Sumiyatun, and ..., “Performance Analysis of the Decision Tree Classification Algorithm on the Water Quality and Potability Dataset,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i3.113.
D. Widyawati, A. Faradibah, and ..., “Comparison Analysis of Classification Model Performance in Lung Cancer Prediction Using Decision Tree, Naive Bayes, and Support Vector Machine,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i2.76.
I. Alwiah, U. Zaky, and A. W. Murdiyanto, “Assessing the Predictive Power of Logistic Regression on Liver Disease Prevalence in the Indian Context,” … J. Data Sci., 2024, doi: 10.56705/ijodas.v5i1.121.
A. P. Wibowo, M. Taruk, T. E. Tarigan, and ..., “Improving Mental Health Diagnostics through Advanced Algorithmic Models: A Case Study of Bipolar and Depressive Disorders,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.122.

Copyright (c) 2024 Indonesian Journal of Data and Science

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
License and Copyright Agreement
By submitting a manuscript to the Indonesian Journal of Data and Science (IJODAS), the author(s) confirm and agree to the following:
- All co-authors have given their consent to enter into this agreement.
- The submitted manuscript has not been formally published elsewhere, except as an abstract, thesis, or in the context of a lecture, review, or overlay journal.
- The manuscript is not currently under review or consideration by another journal or publisher.
- All authors have approved the manuscript and its submission to IJODAS, and where applicable, have received institutional approval (tacit or explicit) from affiliated organizations.
- The authors have secured appropriate permissions to reproduce any third-party material included in the manuscript that may be under copyright.
- The authors agree to abide by the licensing and copyright terms outlined below.
Copyright Policy
Authors who publish in IJODAS retain the copyright to their work and grant the journal the right of first publication. The published work is simultaneously licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0) , which permits others to share and adapt the work for non-commercial purposes, with proper attribution to the authors and the initial publication in this journal.
Reuse and Distribution
- Authors may enter into separate, additional contractual arrangements for non-exclusive distribution of the journal-published version of the article (e.g., institutional repositories, book chapters), provided there is proper acknowledgment of its initial publication in IJODAS.
- Prior to and during the submission process, we encourage authors to archive preprints and accepted versions of their work on personal websites or institutional repositories. This method supports scholarly communication, visibility, and early citation.
For more details on the terms of the Creative Commons license used by IJODAS, please visit the official license page.