Predicting Hair Loss with Machine Learning: A Multi-Factor Analysis

Authors

  • M. Ikbal Siami Institut Teknologi dan Bisnis STIKOM Ambon
  • Huzain Azis Universiti Kuala Lumpur

DOI:

https://doi.org/10.56705/ijaimi.v3i1.360

Keywords:

Hair Loss Prediction, Machine Learning, Feature Engineering, Ensemble Learning, Medical Informatics, Classification, Data Pre-processing

Abstract

Hair loss is a multifactorial condition influenced by genetics, hormonal imbalance, lifestyle choices, and environmental factors. This study investigates the potential of machine learning (ML) to predict hair loss using a diverse dataset comprising categorical and numerical indicators related to these contributing variables. We applied an extensive data preprocessing pipelineincluding handling missing values, frequency encoding, and engineered interaction featuresto improve model input quality. Five ML algorithms (Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, and XGBoost) along with an ensemble voting classifier were trained and evaluated on a balanced dataset. While performance metrics such as accuracy and F1-score remained modest, with the highest values around 50%, the analysis revealed the prominent role of age, stress, and nutritional deficiency in hair loss. Despite the limited predictive capability of the current feature set, this study presents a reproducible framework for ML-driven health diagnostics and identifies key directions for future work. Enhancing data granularity and incorporating richer clinical inputs could significantly boost prediction accuracy in subsequent studies.

References

[1] Y. Wang, M. Hsu, M. Y. Wang, and J. Lin, “Estimating hair density with <scp>XGBoost</scp>,” Int. J. Cosmet. Sci., vol. 47, no. 2, pp. 336–342, Apr. 2025, doi: 10.1111/ics.13030.
[2] M. Noorulhasan, M. Noorulhasan, F. Hajamohideen, R. Alhindasi, and A. Almuqbali, “AI-Driven Scalp and Hair Analysis: A Comprehensive Approach to Personalized Hair Care,” in 2024 International Conference on Decision Aid Sciences and Applications (DASA), Dec. 2024, pp. 1–10, doi: 10.1109/DASA63652.2024.10836558.
[3] D. Baresary, M. Jambu, A. Bansal, T. P. Singh Brar, P. Kaushik, and S. Mehta, “Hair Disease Detection with Deep Learning: A VGG16-Driven Model for Precise Classification of Hair and Scalp Conditions,” in 2024 International Conference on Decision Aid Sciences and Applications (DASA), Dec. 2024, pp. 1–6, doi: 10.1109/DASA63652.2024.10836171.
[4] M. Sindhu, N. G, P. AT, V. T, and P. M. Kumar, “Unmasking Hair Loss Through a Fusion of Human Lifestyle Data Using Machine Learning Algorithms,” in 2025 International Conference on Computing Technologies &amp; Data Communication (ICCTDC), Jul. 2025, pp. 1–6, doi: 10.1109/ICCTDC64446.2025.11158865.
[5] D.-L. Zhu et al., “Identification of key factors and explainability analysis for surgical decision-making in hepatic alveolar echinococcosis assisted by machine learning,” World J. Gastroenterol., vol. 31, no. 37, Oct. 2025, doi: 10.3748/wjg.v31.i37.111038.
[6] J. Pełszyńska and M. Syrkiewicz-Świtała, “Artificial intelligence in trichology - usage and prospects,” E-methodology, vol. 11, no. 11, pp. 9–19, Jul. 2025, doi: 10.15503/emet2024.9.19.
[7] V. N. Vasu, “Prediction of Defective Products Using Logistic Regression Algorithm against Linear Regression Algorithm for Better Accuracy,” 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies, 3ICT 2022. pp. 161–166, 2022, doi: 10.1109/3ICT56508.2022.9990653.
[8] B. H. Reddy, “Classification of Fire and Smoke Images using Decision Tree Algorithm in Comparison with Logistic Regression to Measure Accuracy, Precision, Recall, F-score,” 14th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics, MACS 2022. 2022, doi: 10.1109/MACS56771.2022.10022449.
[9] R. Gal, M. Arar, Y. Atzmon, A. H. Bermano, and ..., “Encoder-based domain tuning for fast personalization of text-to-image models,” ACM Trans. …, 2023, doi: 10.1145/3592133.
[10] S. Horiguchi, Y. Fujita, S. Watanabe, and ..., “Encoder-decoder based attractors for end-to-end neural diarization,” … /ACM Trans. …, 2022, doi: 10.1109/TASLP.2022.3162080.
[11] A. Tuppad and S. D. Patil, “Data Pre-processing Issues in Medical Data Classification,” 2023 Int. Conf. …, 2023, doi: 10.1109/NMITCON58196.2023.10275855.
[12] N. Rezova, L. Kazakovtsev, G. Shkaberina, and ..., “Data Pre-Processing for Ecosystem Behavior Analysis,” 2022 Int. …, 2022, doi: 10.1109/InfoTech55606.2022.9897105.
[13] M. Ahsan, M. Mahmud, P. Saha, K. Gupta, and Z. Siddique, “Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance,” Technologies, vol. 9, no. 3, p. 52, Jul. 2021, doi: 10.3390/technologies9030052.
[14] T. M. Jawa, “Logistic regression analysis for studying the impact of home quarantine on psychological health during COVID-19 in Saudi Arabia,” Alexandria Eng. J., vol. 61, no. 10, pp. 7995–8005, 2022, doi: 10.1016/j.aej.2022.01.047.
[15] Z. Zhao, “Logistic Regression Analysis of Risk Factors and Improvement of Clinical Treatment of Traumatic Arthritis after Total Hip Arthroplasty (THA) in the Treatment of Acetabular Fractures,” Comput. Math. Methods Med., vol. 2022, 2022, doi: 10.1155/2022/7891007.
[16] R. Rohan, “Classification of cardiac arrhythmia diseases from obstructive sleep apnea signals using decision tree classifier,” Int. J. Comput. Inf. Syst. Ind. Manag. Appl., vol. 12, pp. 248–264, 2020.
[17] M. Bhattacharya, “Diabetes Prediction using Logistic Regression and Rule Extraction from Decision Tree and Random Forest Classifiers,” 2023 4th Int. Conf. Emerg. Technol. INCET 2023, 2023, doi: 10.1109/INCET57972.2023.10170270.
[18] G. V. Titaley, N. Rismayanti, A. N. Handayani, and J. T. Ardiansah, “Performance Comparison of Ensemble Learning Models for Brain Tumor Detection on Augmented MRI Datasets,” Ilk. J. Ilm., vol. 17, no. 2, pp. 86–97, Aug. 2025, doi: 10.33096/ilkom.v17i2.2523.86-97.
[19] J. Huan, “Prediction of dissolved oxygen in aquaculture based on gradient boosting decision tree and long short-term memory network: A study of Chang Zhou fishery demonstration base, China,” Comput. Electron. Agric., vol. 175, 2020, doi: 10.1016/j.compag.2020.105530.
[20] A. Callens, “Using Random forest and Gradient boosting trees to improve wave forecast at a specific location,” Appl. Ocean Res., vol. 104, 2020, doi: 10.1016/j.apor.2020.102339.
[21] H. Azis and S. R. Jabir, “Chemical Composition and Aroma Profiling: Decision Tree Modeling of Formalin Tofu,” J. Embed. Syst. Secur. …, 2023.
[22] J. Sun, “Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting,” Inf. Fusion, vol. 54, pp. 128–144, 2020, doi: 10.1016/j.inffus.2019.07.006.
[23] R. Setiawan and H. Oumarou, “Classification of Rice Grain Varieties Using Ensemble Learning and Image Analysis Techniques,” Indones. J. Data …, 2024.
[24] P. Gupta, A. P. Singh, and V. Kumar, “A Review of Ensemble Methods Used in AI Applications,” in Lecture Notes in Electrical Engineering, 2023, vol. 1073 LNEE, pp. 145–157, doi: 10.1007/978-981-99-5080-5_13.
[25] A. Balaram, “Prediction of software fault-prone classes using ensemble random forest with adaptive synthetic sampling algorithm,” Autom. Softw. Eng., vol. 29, no. 1, 2022, doi: 10.1007/s10515-021-00311-z.
[26] Purnawansyah, A. P. Wibawa, and ..., “An in-depth exploration of supervised and semi-supervised learning on face recognition,” Open Computer …. degruyterbrill.com, 2025, doi: 10.1515/comp-2025-0029

Downloads

Published

2025-05-30