Predicting Plant Growth Stages Using Random Forest Classifier: A Machine Learning Approach

Ilham Ilham

doi:10.56705/ijodas.v5i2.167

Authors

Ilham Ilham Universitas DIPA Makassar

DOI:

https://doi.org/10.56705/ijodas.v5i2.167

Keywords:

Machine Learning, Plant Growth, Random Forest, Precision Agriculture, Environmental Factors

Abstract

The optimization of plant growth through predictive modelling is a crucial aspect of modern agricultural practices. This study investigates the application of a Random Forest Classifier to predict plant growth stages based on various environmental and management factors. The dataset, sourced from Kaggle, includes variables such as soil type, sunlight hours, water frequency, fertilizer type, temperature, and humidity. The research involves extensive data pre-processing, including encoding categorical variables, scaling data, and splitting it into training (80%) and testing (20%) sets. The Random Forest Classifier is implemented with 5-fold cross-validation, and its performance is evaluated using accuracy, precision, recall, and F1-score metrics. The model exhibits robust performance with an average accuracy of 84.27%, precision of 85.59%, recall of 84.27%, and F1-score of 83.98%. Visualization techniques such as correlation heatmaps, PCA plots, t-SNE plots, and violin plots are used to provide insights into the data structure and feature relationships. The results confirm the hypothesis that machine learning can effectively predict plant growth stages, offering significant implications for precision agriculture. By accurately identifying growth stages, farmers and greenhouse managers can optimize resource allocation and management practices, leading to enhanced crop yields and sustainability. The study's limitations include the specificity of the dataset and the sole use of the Random Forest Classifier. Future research should explore additional machine learning models and incorporate more diverse datasets to improve generalizability. The findings contribute to the growing body of knowledge on the application of machine learning in agriculture and suggest practical applications for improving agricultural productivity

Downloads

Download data is not yet available.

References

R. A. Disha, “Performance analysis of machine learning models for intrusion detection system using Gini Impurity-based Weighted Random Forest (GIWRF) feature selection technique,” Cybersecurity, vol. 5, no. 1, 2022, doi: 10.1186/s42400-021-00103-8.

Y. Shen, “Random forests-based error-correction of streamflow from a large-scale hydrological model: Using model state variables to estimate error terms,” Comput. Geosci., vol. 159, 2022, doi: 10.1016/j.cageo.2021.105019.

M. Salem, “Random Forest modelling and evaluation of the performance of a full-scale subsurface constructed wetland plant in Egypt,” Ain Shams Eng. J., vol. 13, no. 6, 2022, doi: 10.1016/j.asej.2022.101778.

A. D. Purwanto, “Decision Tree and Random Forest Classification Algorithms for Mangrove Forest Mapping in Sembilang National Park, Indonesia,” Remote Sens., vol. 15, no. 1, 2023, doi: 10.3390/rs15010016.

C. R. Dhivyaa, “Skin lesion classification using decision trees and random forest algorithms,” J. Ambient Intell. Humaniz. Comput., 2020, doi: 10.1007/s12652-020-02675-8.

A. M. Tika, “Classification of potato leaf diseases based on texture, shape and color features using the random forest algorithm,” AIP Conference Proceedings, vol. 2714. 2023, doi: 10.1063/5.0128456.

S. Dasariraju, “Detection and classification of immature leukocytes for diagnosis of acute myeloid leukemia using random forest algorithm,” Bioengineering, vol. 7, no. 4, pp. 1–12, 2020, doi: 10.3390/bioengineering7040120.

H. Moayedi, “Machine-learning-based classification approaches toward recognizing slope stability failure,” Appl. Sci., vol. 9, no. 21, 2019, doi: 10.3390/app9214638.

R. Mohammed, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” 2020 11th International Conference on Information and Communication Systems, ICICS 2020. pp. 243–248, 2020, doi: 10.1109/ICICS49469.2020.239556.

Z. M. Çinar, “Machine learning in predictive maintenance towards sustainable smart manufacturing in industry 4.0,” Sustain., vol. 12, no. 19, 2020, doi: 10.3390/su12198211.

Y. Boer, “Classification of Heart Disease: Comparative Analysis using KNN, Random Forest, Gaussian Naive Bayes, XGBoost, SVM, Decision Tree, and Logistic Regression,” 2023 5th International Conference on Cybernetics and Intelligent Systems, ICORIS 2023. 2023, doi: 10.1109/ICORIS60118.2023.10352195.

Y. Mao, “Disease Classification Based on Eye Movement Features With Decision Tree and Random Forest,” Front. Neurosci., vol. 14, 2020, doi: 10.3389/fnins.2020.00798.

A. Hasnain, “Assessing the ambient air quality patterns associated to the COVID-19 outbreak in the Yangtze River Delta: A random forest approach,” Chemosphere, vol. 314, 2023, doi: 10.1016/j.chemosphere.2022.137638.

O. S. Djandja, “Random forest-based modeling for insights on phosphorus content in hydrochar produced from hydrothermal carbonization of sewage sludge,” Energy, vol. 245, 2022, doi: 10.1016/j.energy.2022.123295.

I. Alwiah, U. Zaky, and A. W. Murdiyanto, “Assessing the Predictive Power of Logistic Regression on Liver Disease Prevalence in the Indian Context,” … J. Data Sci., 2024, doi: 10.56705/ijodas.v5i1.121.

F. T. Admojo and N. Rismayanti, “Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions,” Indones. J. Data …, 2024, doi: 10.56705/ijodas.v5i1.126.

A. P. Wibowo, M. Taruk, T. E. Tarigan, and ..., “Improving Mental Health Diagnostics through Advanced Algorithmic Models: A Case Study of Bipolar and Depressive Disorders,” Indones. J. …, 2024, doi: 10.56705/ijodas.v5i1.122.

S. Khomsah and E. Faizal, “Effectiveness Evaluation of the RandomForest Algorithm in Classifying CancerLips Data,” … Artif. Intell. Med. …, 2023, doi: 10.56705/ijaimi.v1i1.84.

T. E. Tarigan, E. Susanti, M. I. Siami, I. Arfiani, and ..., “Performance Metrics of AdaBoost and Random Forest in Multi-Class Eye Disease Identification: An Imbalanced Dataset Approach,” … Artif. Intell. …, 2023, doi: 10.56705/ijaimi.v1i2.98.

X. Yu, “Random forest algorithm-based classification model of pesticide aquatic toxicity to fishes,” Aquat. Toxicol., vol. 251, 2022, doi: 10.1016/j.aquatox.2022.106265.

D. Kim, “Classification of surface settlement levels induced by TBM driving in urban areas using random forest with data-driven feature selection,” Autom. Constr., vol. 135, 2022, doi: 10.1016/j.autcon.2021.104109.

S. Ortiz-Toquero, “Classification of Keratoconus Based on Anterior Corneal High-order Aberrations: A Cross-validation Study,” Optom. Vis. Sci., vol. 97, no. 3, pp. 169–177, 2020, doi: 10.1097/OPX.0000000000001489.

M. I. Shah, “Machine learning modeling integrating experimental analysis for predicting the properties of sugarcane bagasse ash concrete,” Constr. Build. Mater., vol. 314, 2022, doi: 10.1016/j.conbuildmat.2021.125634.

K. M. Bain, “Cross-validation of three Advanced Clinical Solutions performance validity tests: Examining combinations of measures to maximize classification of invalid performance,” Appl. Neuropsychol., vol. 28, no. 1, pp. 24–34, 2021, doi: 10.1080/23279095.2019.1585352.

M. Rafało, “Cross validation methods: Analysis based on diagnostics of thyroid cancer metastasis,” ICT Express, vol. 8, no. 2, pp. 183–188, 2022, doi: 10.1016/j.icte.2021.05.001.