Performance Analysis of the Decision Tree Classification Algorithm on the Water Quality and Potability Dataset

  • Umar Zaky Universitas Teknologi Yogyakarta
  • Ahmad Naswin Universitas Mega Rezky
  • Sumiyatun Sumiyatun Universitas Teknologi Digital Indonesia
  • Aris Wahyu Murdiyanto Universitas Jenderal Achamd Yani Yogyakarta

Keywords: Decision Tree, Water Quality, Potability, Machine Learning, Cross-validation, Environmental Science

Abstract

Ensuring water potability is paramount for public health and safety. This research aimed to assess the efficacy of the Decision Tree classification algorithm in predicting water potability using the Water Quality and Potability dataset. Employing a 5-fold cross-validation technique, the model showcased a moderate performance with an average accuracy of approximately 54.33%. While the Decision Tree provides a baseline and interpretable mechanism for classification, the results emphasize the need for further exploration using more intricate models or ensemble methods. This study contributes to the broader effort of leveraging machine learning techniques for water quality assessment and provides insights into the potential and limitations of such models in predicting water safety

Downloads

Download data is not yet available.

References

A. Tangkelayuk and E. Mailoa, “Klasifikasi Kualitas Air Menggunakan Metode KNN , Naïve Bayes Dan Decision Tree,” vol. 9, no. 2, pp. 1109–1119, 2022.

A. Maulida, “Penerapan Metode Klasifikasi K-Nearest Neigbor pada Dataset Penderita Penyakit Diabetes,” Indones. J. Data Sci., vol. 1, no. 2, pp. 29–33, 2020.

Ericha Apriliyani and Y. Salim, “Analisis performa metode klasifikasi Naïve Bayes Classifier pada Unbalanced Dataset,” Indones. J. Data Sci., vol. 3, no. 2, pp. 47–54, 2022, doi: 10.56705/ijodas.v3i2.45.

R. Ridho, T. Informatika, F. Teknik, and U. M. Jakarta, “KLASIFIKASI DIAGNOSIS PENYAKIT COVID-19 MENGGUNAKAN METODE DECISION TREE,” vol. 11, no. 3, pp. 69–75, 2021.

H. Azis, “Analisis Performa Metode Support Vector Regression ( SVR ) dalam Memprediksi Harga Bahan Sembako Nasional,” Indones. J. Data Sci., vol. xx, no. 200, 2021.

A. Z. Zami, O. Nurdiawan, and G. Dwilestari, “Klasifikasi Kondisi Gizi Bayi Bawah Lima Tahun Pada Posyandu Melati Dengan Menggunakan Algoritma Decision Tree,” vol. 3, pp. 305–310, 2022, doi: 10.30865/json.v3i3.3892.

D. Cahyanti, A. Rahmayani, and S. Ainy, “Analisis performa metode Knn pada Dataset pasien pengidap Kanker Payudara,” Indones. J. Data Sci., vol. 1, no. 2, pp. 39–43, 2020.

F. T. Admojo and Ahsanawati, “Klasifikasi Aroma Alkohol Menggunakan Metode KNN,” Indones. J. Data Sci., vol. 1, no. 2, pp. 34–38, 2020.

M. Kiguchi, W. Saeed, and I. Medi, “Churn prediction in digital game-based learning using data mining techniques: Logistic regression, decision tree, and random forest,” Appl Soft Comput, 2022, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1568494622000436

W. Gao et al., “Prediction of acute kidney injury in ICU with gradient boosting decision tree algorithms,” Computers in biology and …, 2022, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S001048252100891X

R. Guo, D. Fu, and G. Sollazzo, “An ensemble learning model for asphalt pavement performance prediction based on gradient boosting decision tree,” International Journal of Pavement …, 2022, doi: 10.1080/10298436.2021.1910825.

L. M. Sotarjua And D. B. Santoso, “Perbandingan Algoritma Knn, Decision Tree,* Dan Random* Forest Pada Data Imbalanced Class Untuk Klasifikasi Promosi Karyawan,” … Informatika Sains dan …, 2022, [Online]. Available: https://journal3.uin-alauddin.ac.id/index.php/instek/article/view/31385

M. H. Setiono, “A Komparasi Algoritma Decision Tree, Random Forest, Svm Dan K-Nn Dalam Klasifikasi Kepuasan Penumpang Maskapai Penerbangan,” Inti Nusa Mandiri, 2022, [Online]. Available: https://ejournal.nusamandiri.ac.id/index.php/inti/article/view/3420.

F. Tangguh and Y. Islami, “Analisis performa algoritma Stochastic Gradient Descent ( SGD ) dalam mengklasifikasi tahu berformalin,” Indones. J. Data Sci., vol. 3, no. 1, pp. 1–8, 2022, doi: 10.56705/ijodas.v3i1.42.

L. Britanthia, C. Tanujaya, B. Susanto, and A. Saragih, “Perbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Fitur Mode Audio Spotify,” Indones. J. Data Sci., vol. 1, no. 3, pp. 68–78, 2020.

I. P. Putri, “Analisis Performa Metode K- Nearest Neighbor (KNN) dan Crossvalidation pada Data Penyakit Cardiovascular,” Indones. J. Data Sci., vol. 2, no. 1, pp. 21–28, 2021, doi: 10.33096/ijodas.v2i1.25.

D. Pradana, M. Luthfi Alghifari, M. Farhan Juna, and D. Palaguna, “Klasifikasi Penyakit Jantung Menggunakan Metode Artificial Neural Network,” Indones. J. Data Sci., vol. 3, no. 2, pp. 55–60, 2022, doi: 10.56705/ijodas.v3i2.35.

Published
2023-12-31
How to Cite
Zaky, U., Naswin, A., Sumiyatun, S., & Murdiyanto, A. W. (2023). Performance Analysis of the Decision Tree Classification Algorithm on the Water Quality and Potability Dataset. Indonesian Journal of Data and Science, 4(3), 145-150. https://doi.org/10.56705/ijodas.v4i3.113