Predictive Modeling of Air Quality Levels Using Decision Tree Classification: Insights from Environmental and Demographic Factors
Abstract
Air pollution poses a significant global challenge, adversely impacting public health and environmental sustainability. Understanding the factors influencing air quality is essential for developing effective mitigation strategies. This study aims to analyse key environmental and demographic factors, such as PM2.5 concentration, population density, and proximity to industrial areas, to predict air quality levels using a Decision Tree model. The dataset, comprising 5000 samples, was pre-processed by encoding the target variable and applying Z-score normalization to numerical features. The model was trained on 80% of the data and evaluated on the remaining 20%, achieving an accuracy of 93%. Evaluation metrics, including a classification report and confusion matrix, demonstrated the model's effectiveness in distinguishing between four air quality categories: Good, Moderate, Poor, and Hazardous. PM2.5 emerged as the most critical predictor, followed by demographic and industrial factors. These findings underscore the potential of machine learning models in providing actionable insights for air quality management. The results contribute to public policy by highlighting the need for targeted interventions in high-risk areas and the importance of incorporating environmental data into urban planning. Future work should focus on expanding the feature set and exploring ensemble techniques to further enhance predictive accuracy and robustness.
Downloads
References
D. Yassine, “Classification of Indoor CO2 Levels: Exploring the Impact of Humidity, Temperature, and Occupancy on Air Quality Using Machine Learning Model,” Proceedings of 2024 1st Edition of the Mediterranean Smart Cities Conference, MSCC 2024. 2024, doi: 10.1109/MSCC62288.2024.10697053.
E. Dossev, “Decision Trees for Event Signature Classification on Fiber Optic Cables in Quaternion Coordinates,” 2022 European Conference on Optical Communication, ECOC 2022. 2022.
R. A. Raj, “Classification and Prediction of Incipient Faults in Transformer Oil by Supervised Machine Learning using Decision Tree,” 2023 3rd International Conference on Artificial Intelligence and Signal Processing, AISP 2023. 2023, doi: 10.1109/AISP57993.2023.10134566.
I. Kilic, “Classification of Spyware from Network Packets with Decision Trees Using Recursive Feature Elimination (RFE),” 32nd IEEE Conference on Signal Processing and Communications Applications, SIU 2024 - Proceedings. 2024, doi: 10.1109/SIU61531.2024.10600885.
K. Kamyab-Hesari, “Machine learning for classification of cutaneous sebaceous neoplasms: implementing decision tree model using cytological and architectural features,” Diagn. Pathol., vol. 18, no. 1, 2023, doi: 10.1186/s13000-023-01378-w.
Y. Chen, “Decision tree-based classification in coastal area integrating polarimetric SAR and optical data,” Data Technol. Appl., vol. 56, no. 3, pp. 342–357, 2022, doi: 10.1108/DTA-08-2019-0149.
M. Aqib, “Classification of Edge Applications using Decision Tree, K-NN, & SVM Classifier,” 2022 IEEE Students Conf. Eng. Syst. SCES 2022, 2022, doi: 10.1109/SCES55490.2022.9887690.
S. D. Permai, “Multiclass Classification for Air Quality In Jakarta Using Support Vector Machine and Multi-Layer Perceptron Classifier,” 2022 3rd International Conference on Artificial Intelligence and Data Sciences: Championing Innovations in Artificial Intelligence and Data Sciences for Sustainable Future, AiDAS 2022 - Proceedings. pp. 198–202, 2022, doi: 10.1109/AiDAS56890.2022.9918697.
S. Rani, “Machine Learning-based Multiclass Classification Model for Effective Air Quality Prediction,” 2023 IEEE IAS Global Conference on Emerging Technologies, GlobConET 2023. 2023, doi: 10.1109/GlobConET56651.2023.10149947.
U. Zaky, A. Naswin, S. Sumiyatun, and ..., “Performance Analysis of the Decision Tree Classification Algorithm on the Water Quality and Potability Dataset,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i3.113.
D. Widyawati, A. Faradibah, and ..., “Comparison Analysis of Classification Model Performance in Lung Cancer Prediction Using Decision Tree, Naive Bayes, and Support Vector Machine,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i2.76.
F. T. Admojo and N. Rismayanti, “Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions,” Indones. J. Data …, 2024, doi: 10.56705/ijodas.v5i1.126.
M. N. Hasan, “Fetal Brain Planes Classification Using Deep Ensemble Transfer Learning from U-Net Segmented Fetal Neurosonography Images,” Int. J. Image, Graph. Signal Process., vol. 16, no. 4, pp. 74–86, 2024, doi: 10.5815/ijigsp.2024.04.06.
T. T. Fousiya, “Diabetic Retinopathy Classification Based on Segmented Retinal Vasculature of Fundus Images Using Attention U-NET,” INDICON 2022 - 2022 IEEE 19th India Council International Conference. 2022, doi: 10.1109/INDICON56171.2022.10039734.
R. Rohan, “Classification of cardiac arrhythmia diseases from obstructive sleep apnea signals using decision tree classifier,” Int. J. Comput. Inf. Syst. Ind. Manag. Appl., vol. 12, pp. 248–264, 2020.
D. R. Nemade, “Diabetes prediction using BPSO and decision tree classifier,” 2nd Int. Conf. Data, Eng. Appl. IDEA 2020, 2020, doi: 10.1109/IDEA49133.2020.9170744.
I. A. P. Banlawe, “Decision Tree Learning Algorithm and Naïve Bayes Classifier Algorithm Comparative Classification for Mango Pulp Weevil Mating Activity,” 2021 IEEE Int. Conf. Autom. Control Intell. Syst. I2CACIS 2021 - Proc., pp. 317–322, 2021, doi: 10.1109/I2CACIS52118.2021.9495863.
J. A. D. de Jesus Ferreira, “Decision tree classifiers for unmanned aircraft configuration selection,” Aircr. Eng. Aerosp. Technol., vol. 93, no. 6, pp. 1122–1132, 2021, doi: 10.1108/AEAT-03-2021-0074.
A. Naswin and A. P. Wibowo, “Performance Analysis of the Decision Tree Classification Algorithm on the Pneumonia Dataset,” … Artif. Intell. Med. …, 2023, doi: 10.56705/ijaimi.v1i1.83.
I. P. A. Pratama, E. S. J. Atmadji, and ..., “Evaluating the Performance of Voting Classifier in Multiclass Classification of Dry Bean Varieties,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.124.
S. Hidayat, H. M. T. Ramadhan, and ..., “Comparison of K-Nearest Neighbor and Decision Tree Methods using Principal Component Analysis Technique in Heart Disease Classification,” Indones. J. …, 2023, doi: 10.56705/ijodas.v4i2.70.

Authors retain copyright and full publishing rights to their articles. Upon acceptance, authors grant Indonesian Journal of Data and Science a non-exclusive license to publish the work and to identify itself as the original publisher.
Self-archiving. Authors may deposit the submitted version, accepted manuscript, and version of record in institutional or subject repositories, with citation to the published article and a link to the version of record on the journal website.
Commercial permissions. Uses intended for commercial advantage or monetary compensation are not permitted under CC BY-NC 4.0. For permissions, contact the editorial office at [editorial email/contact form].
Legacy notice. Some earlier PDFs may display “Copyright © [Journal Name]” or only a CC BY-NC logo without the full license text. For the avoidance of doubt, authors hold copyright, and all articles are distributed under CC BY-NC 4.0. Where any discrepancy exists, this policy and the article landing-page license statement prevail.