Fine-Tuning a Large Language Model on Vertex AI for a New Student Registration Chatbot at Universitas Muhammadiyah Makassar

Desi Anggreani; Muhyiddin A M Hayat; Lukman; Ahmad  Faisal; Khadijah; Darniati

doi:10.56705/ijodas.v7i1.341

Authors

Desi Anggreani Universitas Muhammadiyah Makassar
Muhyiddin A M Hayat Universitas Muhammadiyah Makassar
Lukman Universitas Muhammadiyah Makassar
Ahmad Faisal Universitas Muhammadiyah Makassar
Khadijah Universitas Muhammadiyah Makassar
Darniati Universitas Muhammadiyah Makassar

DOI:

https://doi.org/10.56705/ijodas.v7i1.341

Keywords:

Chatbot, Large Language Model (LLM), Fine-tuning, Google Cloud Vertex AI, BLEU, ROUGE-L, Customer Satisfaction Score (CSAT), Student Admission

Abstract

This study addresses the limitations of manual admission services at Universitas Muhammadiyah Makassar, which often result in delayed and inconsistent information delivery. To overcome these challenges, an institution-specific chatbot was developed by fine-tuning the Gemini 2.5 Flash model on the Google Cloud Vertex AI platform. The model was trained using a curated domain-specific dataset of 1,430 question–answer pairs derived from official documents and frequently asked questions. The fine-tuning process employed supervised learning to enhance contextual relevance and response accuracy. System performance was evaluated using automated text quality metrics, achieving an average BLEU score of 0.23526 and a ROUGE-L Recall score of 0.53424, indicating satisfactory lexical and semantic similarity. Furthermore, a user acceptance evaluation involving 52 respondents yielded a Customer Satisfaction Score (CSAT) of 84.2%, reflecting high user satisfaction. These results demonstrate that fine-tuning a Large Language Model (LLM) for specific institutional needs effectively improves both response quality and service reliability. Ultimately, this approach offers a practical and scalable solution for modernizing student admission services in higher education, ensuring that prospective students receive accurate information in a timely and efficient manner.

Downloads

Download data is not yet available.

References

[1] Fahmi Yusron Fiddin, A. Komarudin, and M. Melina, “Chatbot Informasi Penerimaan Mahasiswa Baru Menggunakan Metode FastText dan LSTM,” J. Appl. Comput. Sci. Technol., vol. 5, no. 1, pp. 33–39, 2024, doi: 10.52158/jacost.v5i1.648.

[2] I. A. E. Z. & N. A. M. J. C. Suardi, D. Anggreani, A.P. Wibawa, N. Murtadlo, “Asking a chatbot for food ingredients halal status,” in Halal Development: Trends, Opportunities and Challenges, 2021, pp. 14–20. doi: 10.1201/9781003189282-6.

[3] C. Ehrett, S. Hegde, K. Andre, D. Liu, and T. Wilson, “Leveraging Open-Source Large Language Models for Data Augmentation to Improve Text Classification in Surveys of Medical Staff (Preprint),” JMIR Med. Educ., vol. 10, 2023, doi: 10.2196/51433.

[4] A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 2017-Decem, no. Nips, pp. 5999–6009, 2017.

[5] T. B. Brown et al., “Language models are few-shot learners,” Adv. Neural Inf. Process. Syst., vol. 2020-Decem, 2020.

[6] S. Bubeck et al., “Sparks of Artificial General Intelligence: Early experiments with GPT-4,” 2023.

[7] J. Li, T. Tang, W. X. Zhao, and J. R. Wen, “Pretrained Language Models for Text Generation: A Survey,” IJCAI Int. Jt. Conf. Artif. Intell., vol. 1, no. 1, pp. 4492–4499, 2021, doi: 10.24963/ijcai.2021/612.

[8] P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” Adv. Neural Inf. Process. Syst., vol. 2020-Decem, no. NeurIPS, 2020.

[9] P. Sarzaeim, Q. H. Mahmoud, and A. Azim, “A Framework for LLM-Assisted Smart Policing System,” IEEE Access, vol. 12, pp. 74915–74929, 2024, doi: 10.1109/ACCESS.2024.3404862.

[10] C. Ziems, W. Held, O. Shaikh, J. Chen, Z. Zhang, and D. Yang, “Can Large Language Models Transform Computational Social Science?,” Comput. Linguist., vol. 50, no. 1, pp. 237–291, 2023, doi: 10.1162/coli_a_00502.

[11] OpenAI et al., “GPT-4 Technical Report,” vol. 4, pp. 1–100, 2024.

[12] K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, “BLEU: A method for automatic evaluation of machine translation,” Proc. Annu. Meet. Assoc. Comput. Linguist., vol. 2002-July, no. July, pp. 311–318, 2002.

[13] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” in Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Association for Computational Linguistics, 2004, pp. 74–81. doi: 10.1253/jcj.34.1213.

[14] H. Shen, H. Jin, Á. A. Cabrera, A. Perer, H. Zhu, and J. I. Hong, “Designing Alternative Representations of Confusion Matrices to Support Non-Expert Public Understanding of Algorithm Performance,” Proc. ACM Human-Computer Interact., vol. 4, no. CSCW2, 2020, doi: 10.1145/3415224.

[15] M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.

[16] W. Alkishri, J. H. Yousif, Y. N. Al Husaini, and M. Al-Bahri, “Conversational AI in Education: A General Review of Chatbot Technologies and Challenges,” J. Logist. Informatics Serv. Sci., vol. 12, no. 3, pp. 264–282, 2025, doi: 10.33168/JLISS.2025.0316.

[17] W. Xia, C. Qin, and E. Hazan, “Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning,” 2024.

[18] E. Hu et al., “Lora: Low-Rank Adaptation of Large Language Models,” ICLR 2022 - 10th Int. Conf. Learn. Represent., pp. 1–26, 2022.

[19] J. Van Herck et al., “Assessment of fine-tuned large language models for real-world chemistry and material science applications,” Chem. Sci., pp. 670–684, 2024, doi: 10.1039/d4sc04401k.

[20] L. Luo et al., “Taiyi: A bilingual fine-tuned large language model for diverse biomedical tasks,” J. Am. Med. Informatics Assoc., vol. 31, no. 9, pp. 1865–1874, 2024, doi: 10.1093/jamia/ocae037.

[21] W. Zhang et al., “Fine-tuning large language models for chemical text mining,” Chem. Sci., vol. 15, no. 27, pp. 10600–10611, 2024, doi: 10.1039/d4sc00924j.

[22] Y. Li, Y. Du, K. Zhou, J. Wang, W. X. Zhao, and J. R. Wen, “Evaluating Object Hallucination in Large Vision-Language Models,” EMNLP 2023 - 2023 Conf. Empir. Methods Nat. Lang. Process. Proc., no. Table 1, pp. 292–305, 2023, doi: 10.18653/v1/2023.emnlp-main.20.

[23] I. O. Gallegos et al., “Bias and Fairness in Large Language Models: A Survey,” Comput. Linguist., vol. 50, no. 3, pp. 1097–1179, 2024, doi: 10.1162/coli_a_00524.

[24] H. Zhao et al., “Explainability for Large Language Models: A Survey,” ACM Trans. Intell. Syst. Technol., vol. 15, no. 2, 2024, doi: 10.1145/3639372.

[25] Y. Chai, Q. Liu, S. Wang, Y. Sun, Q. Peng, and H. Wu, “On Training Data Influence of GPT Models,” EMNLP 2024 - 2024 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 3126–3150, 2024, doi: 10.18653/v1/2024.emnlp-main.183.