Comparison Analysis of Classification Model Performance in Lung Cancer Prediction Using Decision Tree, Naive Bayes, and Support Vector Machine
Abstract
This research aims to analyze the performance of three classification models, namely Decision Tree Classifier, Support Vector Machine, and Naive Bayes Classifier, in predicting lung cancer using the "Lung Cancer Prediction" dataset. The performance evaluation metrics used include accuracy, precision weighted, recall weighted, and F1 weighted. As a preliminary step, exploratory data analysis (EDA) and dataset preprocessing, including feature selection, data cleaning, and data transformation, were conducted. The test data results showed that the Decision Tree Classifier and Naive Bayes Classifier had similar performances with high accuracy, precision, recall, and F1 values. Meanwhile, the Support Vector Machine also exhibited competitive performance, although its precision weighted value was slightly lower. Additionally, an outlier analysis was conducted using box plots, revealing that the Decision Tree Classifier had 2 outlier values, while the Support Vector Machine had 4 outlier values, and Naive Bayes had no outlier values. In conclusion, all three classification models demonstrated good potential in lung cancer prediction. However, selecting the best model requires consideration of relevant evaluation metrics for the application and accommodating the limitations of each model. Further evaluation and in-depth analysis are needed to ensure the reliability of the models in predicting lung cancer cases more accurately and consistently.
Downloads
References
G. Sruthi, C. L. Ram, M. K. Sai, B. P. Singh, and ..., “Cancer prediction using machine learning,” … in Technology and …, 2022, [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9754059/
Y. Zhou, C. Zhang, and S. Gao, “Breast cancer classification from histopathological images using resolution adaptive network,” IEEE Access, 2022, [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9745527/
H. Hamdani, H. R. Hatta, N. Puspitasari, and ..., “Dengue classification method using support vector machines and cross-validation techniques,” … Journal of Artificial …, 2022, [Online]. Available: https://search.proquest.com/openview/a607c8361a7aac70dfc0dabf2b63f41b/1?pq-origsite=gscholar&cbl=1686339
A. Roy and S. Chakraborty, “Support vector machine in structural reliability analysis: A review,” Reliability Engineering &System Safety, 2023, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0951832023000418
H. Hafdaoui, A. Chahtou, S. Bouchakour, and ..., “Analyzing the performance of photovoltaic systems using support vector machine classifier,” … Energy, Grids and …, 2022, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2352467721001508
T. A. Mutiara and Q. N. Azizah, “Klasifikasi Tumor Otak Menggunakan Ekstraksi Fitur HOG dan Support Vector Machine,” Jurnal Infortech, 2022, [Online]. Available: https://ejournal.bsi.ac.id/ejurnal/index.php/infortech/article/view/12813
H. N. Mahendra and ..., “An efficient classification of hyperspectral remotely sensed data using support vector machine,” International Journal of …, 2022, [Online]. Available: https://yadda.icm.edu.pl/baztech/element/bwmeta1.element.baztech-17005b9c-52b2-4dc3-9618-036c7b97d6f9
F. A. SATRIA, A. Abdiansah, and A. S. Utami, DETEKSI DOMAIN TIDAK RELEVAN (OUT-OF-DOMAIN) PADA CHATBOT BERBAHASA INDONESIA MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE. repository.unsri.ac.id, 2022. [Online]. Available: https://repository.unsri.ac.id/72916/
W. Sun and J. Zhang, “A novel carbon price prediction model based on optimized least square support vector machine combining characteristic-scale decomposition and phase space …,” Energy, 2022, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0360544222010702
T. Adugna, W. Xu, and J. Fan, “Comparison of random forest and support vector machine classifiers for regional land cover mapping using coarse resolution FY-3C images,” Remote Sens (Basel), 2022, [Online]. Available: https://www.mdpi.com/2072-4292/14/3/574
A. Fatihin, D. Khairani, S. U. U. Masruroh, and ..., “Public Sentiment on User Reviews about Application in Handling COVID-19 using Naive Bayes Method and Support Vector Machine,” … on Science and …, 2022, [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9829068/
B. Imran, H. Hambali, A. Subki, and ..., “Data Mining Using Random Forest, Naïve Bayes, and Adaboost Models for Prediction and Classification of Benign and Malignant Breast Cancer,” Jurnal Pilar Nusa …, 2022, [Online]. Available: http://ejournal.nusamandiri.ac.id/index.php/pilar/article/view/2912
N. Deepa, J. S. Priya, and T. Devi, “Towards applying internet of things and machine learning for the risk prediction of COVID-19 in pandemic situation using Naive Bayes classifier for improving …,” Mater Today Proc, 2022, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2214785322016868
A. Ainurrohmah and D. T. Wiyanti, “Analisis Performa Algoritma Decision Tree, Naive Bayes, K-Nearest Neighbor untuk Klasifikasi Zona Daerah Risiko Covid-19 di Indonesia,” Jurnal Teknologi Informasi dan Ilmu …, 2023, [Online]. Available: http://jtiik.ub.ac.id/index.php/jtiik/article/view/5935
N. Attamami, A. Triayudi, and ..., “Analisis Performa Algoritma Klasifikasi Naive Bayes Dan C4. 5 Untuk Prediksi Penerima Bantuan Jaminan Kesehatan,” Jurnal Jtik (Jurnal …, 2023, [Online]. Available: http://journal.lembagakita.org/index.php/jtik/article/view/756
T. NUGRAHA, ANALISIS SENTIMEN RESPONS MASYARAKAT TERHADAP KARTU PRAKERJA MENGGUNAKAN ALGORITMA K-NN, NAÏVE BAYES DAN SVM. repository.mercubuana.ac.id, 2022. [Online]. Available: https://repository.mercubuana.ac.id/70599/
M. Kiguchi, W. Saeed, and I. Medi, “Churn prediction in digital game-based learning using data mining techniques: Logistic regression, decision tree, and random forest,” Appl Soft Comput, 2022, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1568494622000436
W. Gao et al., “Prediction of acute kidney injury in ICU with gradient boosting decision tree algorithms,” Computers in biology and …, 2022, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S001048252100891X
R. Guo, D. Fu, and G. Sollazzo, “An ensemble learning model for asphalt pavement performance prediction based on gradient boosting decision tree,” International Journal of Pavement …, 2022, doi: 10.1080/10298436.2021.1910825.
L. M. SOTARJUA and D. B. SANTOSO, “PERBANDINGAN ALGORITMA KNN, DECISION TREE,* DAN RANDOM* FOREST PADA DATA IMBALANCED CLASS UNTUK KLASIFIKASI PROMOSI

Copyright (c) 2023 Indonesian Journal of Data and Science

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
License and Copyright Agreement
In submitting the manuscript to the journal, the authors certify that:
- They are authorized by their co-authors to enter into these arrangements.
- The work described has not been formally published before, except in the form of an abstract or as part of a published lecture, review, thesis, or overlay journal.
- The work is not under consideration for publication elsewhere.
- The work has been approved by all the author(s) and by the responsible authorities – tacitly or explicitly – of the institutes where the work has been carried out.
- They secure the right to reproduce any material that has already been published or copyrighted elsewhere.
- They agree to the following license and copyright agreement.
Copyright
Authors who publish with Indonesian Journal of Data and Science agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. (CC BY-NC 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.