Predictive Modeling of Air Quality Levels Using Decision Tree Classification: Insights from Environmental and Demographic Factors
Abstract
Air pollution poses a significant global challenge, adversely impacting public health and environmental sustainability. Understanding the factors influencing air quality is essential for developing effective mitigation strategies. This study aims to analyse key environmental and demographic factors, such as PM2.5 concentration, population density, and proximity to industrial areas, to predict air quality levels using a Decision Tree model. The dataset, comprising 5000 samples, was pre-processed by encoding the target variable and applying Z-score normalization to numerical features. The model was trained on 80% of the data and evaluated on the remaining 20%, achieving an accuracy of 93%. Evaluation metrics, including a classification report and confusion matrix, demonstrated the model's effectiveness in distinguishing between four air quality categories: Good, Moderate, Poor, and Hazardous. PM2.5 emerged as the most critical predictor, followed by demographic and industrial factors. These findings underscore the potential of machine learning models in providing actionable insights for air quality management. The results contribute to public policy by highlighting the need for targeted interventions in high-risk areas and the importance of incorporating environmental data into urban planning. Future work should focus on expanding the feature set and exploring ensemble techniques to further enhance predictive accuracy and robustness.
Downloads
References
D. Yassine, “Classification of Indoor CO2 Levels: Exploring the Impact of Humidity, Temperature, and Occupancy on Air Quality Using Machine Learning Model,” Proceedings of 2024 1st Edition of the Mediterranean Smart Cities Conference, MSCC 2024. 2024, doi: 10.1109/MSCC62288.2024.10697053.
E. Dossev, “Decision Trees for Event Signature Classification on Fiber Optic Cables in Quaternion Coordinates,” 2022 European Conference on Optical Communication, ECOC 2022. 2022.
R. A. Raj, “Classification and Prediction of Incipient Faults in Transformer Oil by Supervised Machine Learning using Decision Tree,” 2023 3rd International Conference on Artificial Intelligence and Signal Processing, AISP 2023. 2023, doi: 10.1109/AISP57993.2023.10134566.
I. Kilic, “Classification of Spyware from Network Packets with Decision Trees Using Recursive Feature Elimination (RFE),” 32nd IEEE Conference on Signal Processing and Communications Applications, SIU 2024 - Proceedings. 2024, doi: 10.1109/SIU61531.2024.10600885.
K. Kamyab-Hesari, “Machine learning for classification of cutaneous sebaceous neoplasms: implementing decision tree model using cytological and architectural features,” Diagn. Pathol., vol. 18, no. 1, 2023, doi: 10.1186/s13000-023-01378-w.
Y. Chen, “Decision tree-based classification in coastal area integrating polarimetric SAR and optical data,” Data Technol. Appl., vol. 56, no. 3, pp. 342–357, 2022, doi: 10.1108/DTA-08-2019-0149.
M. Aqib, “Classification of Edge Applications using Decision Tree, K-NN, & SVM Classifier,” 2022 IEEE Students Conf. Eng. Syst. SCES 2022, 2022, doi: 10.1109/SCES55490.2022.9887690.
S. D. Permai, “Multiclass Classification for Air Quality In Jakarta Using Support Vector Machine and Multi-Layer Perceptron Classifier,” 2022 3rd International Conference on Artificial Intelligence and Data Sciences: Championing Innovations in Artificial Intelligence and Data Sciences for Sustainable Future, AiDAS 2022 - Proceedings. pp. 198–202, 2022, doi: 10.1109/AiDAS56890.2022.9918697.
S. Rani, “Machine Learning-based Multiclass Classification Model for Effective Air Quality Prediction,” 2023 IEEE IAS Global Conference on Emerging Technologies, GlobConET 2023. 2023, doi: 10.1109/GlobConET56651.2023.10149947.
U. Zaky, A. Naswin, S. Sumiyatun, and ..., “Performance Analysis of the Decision Tree Classification Algorithm on the Water Quality and Potability Dataset,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i3.113.
D. Widyawati, A. Faradibah, and ..., “Comparison Analysis of Classification Model Performance in Lung Cancer Prediction Using Decision Tree, Naive Bayes, and Support Vector Machine,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i2.76.
F. T. Admojo and N. Rismayanti, “Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions,” Indones. J. Data …, 2024, doi: 10.56705/ijodas.v5i1.126.
M. N. Hasan, “Fetal Brain Planes Classification Using Deep Ensemble Transfer Learning from U-Net Segmented Fetal Neurosonography Images,” Int. J. Image, Graph. Signal Process., vol. 16, no. 4, pp. 74–86, 2024, doi: 10.5815/ijigsp.2024.04.06.
T. T. Fousiya, “Diabetic Retinopathy Classification Based on Segmented Retinal Vasculature of Fundus Images Using Attention U-NET,” INDICON 2022 - 2022 IEEE 19th India Council International Conference. 2022, doi: 10.1109/INDICON56171.2022.10039734.
R. Rohan, “Classification of cardiac arrhythmia diseases from obstructive sleep apnea signals using decision tree classifier,” Int. J. Comput. Inf. Syst. Ind. Manag. Appl., vol. 12, pp. 248–264, 2020.
D. R. Nemade, “Diabetes prediction using BPSO and decision tree classifier,” 2nd Int. Conf. Data, Eng. Appl. IDEA 2020, 2020, doi: 10.1109/IDEA49133.2020.9170744.
I. A. P. Banlawe, “Decision Tree Learning Algorithm and Naïve Bayes Classifier Algorithm Comparative Classification for Mango Pulp Weevil Mating Activity,” 2021 IEEE Int. Conf. Autom. Control Intell. Syst. I2CACIS 2021 - Proc., pp. 317–322, 2021, doi: 10.1109/I2CACIS52118.2021.9495863.
J. A. D. de Jesus Ferreira, “Decision tree classifiers for unmanned aircraft configuration selection,” Aircr. Eng. Aerosp. Technol., vol. 93, no. 6, pp. 1122–1132, 2021, doi: 10.1108/AEAT-03-2021-0074.
A. Naswin and A. P. Wibowo, “Performance Analysis of the Decision Tree Classification Algorithm on the Pneumonia Dataset,” … Artif. Intell. Med. …, 2023, doi: 10.56705/ijaimi.v1i1.83.
I. P. A. Pratama, E. S. J. Atmadji, and ..., “Evaluating the Performance of Voting Classifier in Multiclass Classification of Dry Bean Varieties,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.124.
S. Hidayat, H. M. T. Ramadhan, and ..., “Comparison of K-Nearest Neighbor and Decision Tree Methods using Principal Component Analysis Technique in Heart Disease Classification,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i2.70.

Copyright (c) 2024 Indonesian Journal of Data and Science

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
License and Copyright Agreement
In submitting the manuscript to the journal, the authors certify that:
- They are authorized by their co-authors to enter into these arrangements.
- The work described has not been formally published before, except in the form of an abstract or as part of a published lecture, review, thesis, or overlay journal.
- The work is not under consideration for publication elsewhere.
- The work has been approved by all the author(s) and by the responsible authorities – tacitly or explicitly – of the institutes where the work has been carried out.
- They secure the right to reproduce any material that has already been published or copyrighted elsewhere.
- They agree to the following license and copyright agreement.
Copyright
Authors who publish with Indonesian Journal of Data and Science agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. (CC BY-NC 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.