Information Extraction from Makassar Culinary Images Using Vision Transformers and Cahya GPT-2 (Visual Question Answering Case Study
DOI:
https://doi.org/10.56705/ijodas.v6i3.357Keywords:
Vision Transformer, Cahya_GPT-2, Visual Question Answering, Fine Tuning, Citra Kuliner MakassarAbstract
This study examines the development of a Visual Question Answering (VQA) system to extract information from images of Makassar culinary specialties by combining the Vision Transformer (ViT) and Cahya_GPT-2 models. The main objective is to integrate visual and natural language understanding so that computers can recognize visual objects (food images) and generate relevant text descriptions. The research method uses an experimental approach with a fine-tuning process of the pre-trained ViT model as a visual encoder and Cahya_GPT-2 as a text decoder. The dataset used includes images of Makassar culinary specialties such as Coto, Konro, Pisang Epe, Barongko, and Jalangkote with question and answer (QnA) annotations. Evaluation is carried out using the ROUGE metric to assess the semantic match between the model's answers and the actual answers. The results show that the developed multimodal model is able to accurately understand the image context with an average ROUGE-L score of 0.63, indicating a good level of closeness between the model's answers and the annotations. In conclusion, the combination of ViT and Cahya_GPT-2 can be an effective approach for natural language-based visual information extraction systems, especially in the Indonesian local culinary domain
Downloads
References
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Tirta Chiantalia Sharief, Hazriani, Syamsul, Anas, Yuyun

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors retain copyright and full publishing rights to their articles. Upon acceptance, authors grant Indonesian Journal of Data and Science a non-exclusive license to publish the work and to identify itself as the original publisher.
Self-archiving. Authors may deposit the submitted version, accepted manuscript, and version of record in institutional or subject repositories, with citation to the published article and a link to the version of record on the journal website.
Commercial permissions. Uses intended for commercial advantage or monetary compensation are not permitted under CC BY-NC 4.0. For permissions, contact the editorial office at ijodas.journal@gmail.com.
Legacy notice. Some earlier PDFs may display “Copyright © [Journal Name]” or only a CC BY-NC logo without the full license text. To ensure clarity, the authors maintain copyright, and all articles are distributed under CC BY-NC 4.0. Where any discrepancy exists, this policy and the article landing-page license statement prevail.










