Sarcasm and Irony Detection in Lazada App Reviews Using IndoBERT
DOI:
https://doi.org/10.56705/ijodas.v6i3.307Keywords:
Natural Language Processing, IndoBERT, Sarcasm Detection, Irony Detection, E-Commerce ReviewsAbstract
Digital technology has reshaped consumer behavior, particularly in e-commerce, where Google Play Store reviews provide rich feedback but often include sarcasm and irony that conventional sentiment models misread. This study proposes an Indonesian sarcasm–irony detection model using IndoBERT, a transformer pre-trained on Indonesian corpora. A dataset of 1,998 Lazada app reviews was collected via web scraping and preprocessed through text cleaning, tokenization, and stopword removal with the Sastrawi library. IndoBERT was fine-tuned to classify reviews into three classes: sarcasm, irony, and literal. Performance was assessed using accuracy, precision, recall, F1-score, and a confusion matrix. The model achieved 96.40% accuracy, with F1-scores of 0.9725 (sarcasm), 0.9675 (irony), and 0.9267 (literal). Word cloud visualizations revealed distinct lexical patterns across classes, supporting IndoBERT’s ability to capture contextual cues behind implicit sentiment. The findings indicate IndoBERT is effective for advanced opinion mining in Indonesian e-commerce, with potential applications in customer feedback monitoring, surfacing hidden complaints, and improving recommendation systems beyond surface polarity. Limitations include reliance on a single platform (Google Play) and text-only input, without modeling non-textual signals such as emojis or punctuation intensity. Future work should test cross-platform generalization, incorporate non-textual cues, and apply data augmentation to reduce class imbalance, particularly for the less frequent literal class, to improve robustness for real-world deployment
Downloads
References
[2] D. Nuryadi et al., “Fine Tuning Indobert Untuk Analisis Sentimen Pada Ulasan Pengguna Aplikasi Tiket.Com Di Google Play Store,” JATI (Jurnal Mhs. Tek. Inform., vol. 9, no. 2, pp. 3577–3583, 2025, doi: 10.36040/jati.v9i2.13204.
[3] L. Geni, E. Yulianti, and D. I. Sensuse, “Sentiment Analysis of Tweets Before the 2024 Elections in Indonesia Using IndoBERT Language Models,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 9, no. 3, pp. 746–757, 2023, doi: 10.26555/jiteki.v9i3.26490.
[4] H. H. Friedman, “The education irony: when college degrees lead to unemployment, mindless thinking, debt, and despair,” Acad. Ment. Heal. Well-Being, vol. 2, no. 2, pp. 1–10, 2025, doi: 10.20935/mhealthwellb7661.
[5] P. Sayarizki and H. Nurrahmi, “Implementation of IndoBERT for Sentiment Analysis of Indonesian Presidential Candidates,” J. Comput., vol. 9, no. 2, pp. 61–72, 2024, doi: 10.34818/indojc.2024.9.2.934.
[6] S. Arora et al., “Simple linear attention language models balance the recall-throughput tradeoff,” Proc. Mach. Learn. Res., vol. 235, pp. 1763–1840, 2024.
[7] M. D. Hilmawan, “Deteksi Sarkasme Pada Judul Berita Berbahasa Inggris Menggunakan Algoritme Bidirectional LSTM,” J. Dinda Data Sci. Inf. Technol. Data Anal., vol. 2, no. 1, pp. 46–51, 2022, doi: 10.20895/dinda.v2i1.331.
[8] A. J. Putri, A. S. Syafira, M. E. Purbaya, and D. Purnomo, “Analisis Sentimen E-Commerce Lazada pada Jejaring Sosial Twitter Menggunakan Algoritma Support Vector Machine,” J. TRINISTIK J. Tek. Ind. Bisnis Digit. dan Tek. Logistik, vol. 1, no. 1, pp. 16–21, 2022, doi: 10.20895/trinistik.v1i1.447.
[9] K. A. Pradani and L. H. Suadaa, “Automated Essay Scoring Menggunakan Semantic Textual Similarity Berbasis Transformer Untuk Penilaian Ujian Esai,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 6, pp. 1177–1184, 2023, doi: 10.25126/jtiik.2023107338.
[10] G. Z. Nabiilah, I. N. Alam, E. S. Purwanto, and M. F. Hidayat, “Indonesian multilabel classification using IndoBERT embedding and MBERT classification,” Int. J. Electr. Comput. Eng., vol. 14, no. 1, pp. 1071–1078, 2024, doi: 10.11591/ijece.v14i1.pp1071-1078.
[11] B. V. Kartika, M. J. Alfredo, and G. P. Kusuma, “Fine-Tuned IndoBERT based model and data augmentation for indonesian language paraphrase identification,” Rev. d’Intelligence Artif., vol. 37, no. 3, pp. 733–743, 2023, doi: 10.18280/ria.370322.
[12] S. C. M. D. S. Sirisuriya, “Importance of Web Scraping as a Data Source for Machine Learning Algorithms - Review,” 2023 IEEE 17th Int. Conf. Ind. Inf. Syst. ICIIS 2023 - Proc., pp. 134–139, 2023, doi: 10.1109/ICIIS58898.2023.10253502.
[13] V. Çetin and O. Yıldız, “A comprehensive review on data preprocessing techniques in data analysis,” Pamukkale Univ. J. Eng. Sci., vol. 28, no. 2, pp. 299–312, 2022, doi: 10.5505/pajes.2021.62687.
[14] E. Y. Daraghmi, S. Qadan, Y. A. Daraghmi, R. Yousuf, O. Cheikhrouhou, and M. Baz, “From Text to Insight: An Integrated CNN-BiLSTM-GRU Model for Arabic Cyberbullying Detection,” IEEE Access, vol. 12, no. August, pp. 103504–103519, 2024, doi: 10.1109/ACCESS.2024.3431939.
[15] E. Dotan, G. Jaschek, T. Pupko, and Y. Belinkov, “Effect of tokenization on transformers for biological sequences,” Bioinformatics, vol. 40, no. 4, pp. 1–15, 2024, doi: 10.1093/bioinformatics/btae196.
[16] E. Helmud, E. Helmud, F. Fitriyani, and P. Romadiana, “Classification Comparison Performance of Supervised Machine Learning Random Forest and Decision Tree Algorithms Using Confusion Matrix,” J. Sisfokom (Sistem Inf. dan Komputer), vol. 13, no. 1, pp. 92–97, 2024, doi: 10.32736/sisfokom.v13i1.1985.
[17] Z. Jannah, R. Kurniawan, and S. Anwar, “Studi Algoritma Neural Network Dalam Klasifikasi Sentimen Pengguna Shopee: Peningkatan Akurasi Model,” J. Inform. dan Tek. Elektro Terap., vol. 13, no. 2, 2025, doi: 10.23960/jitet.v13i2.6113.
[18] A. Upadhyay et al., “Deep learning and computer vision in plant disease detection: a comprehensive review of techniques, models, and trends in precision agriculture,” Artif. Intell. Rev., vol. 58, no. 3, 2025, doi: 10.1007/s10462-024-11100-x.
[19] Z. Niu et al., “Piscis: a novel loss estimator of the F1 score enables accurate spot detection in fluorescence microscopy images via deep learning,” bioRxiv, pp. 1–21, 2024, [Online]. Available: https://doi.org/10.1101/2024.01.31.578123
[20] C. Ma et al., “Multi-objective topology optimization for cooling element of precision gear grinding machine tool,” Int. Commun. Heat Mass Transf., vol. 160, no. November 2024, p. 108356, 2025, doi: 10.1016/j.icheatmasstransfer.2024.108356.
[21] M. Furqan, S. Sriani, and M. N. Shidqi, “Chatbot Telegram Menggunakan Natural Language Processing,” Walisongo J. Inf. Technol., vol. 5, no. 1, pp. 15–26, 2023, doi: 10.21580/wjit.2023.5.1.14793.
[22] S. Sharma and P. Chaudhary, “Machine learning and deep learning,” Quantum Comput. Artif. Intell. Train. Mach. Deep Learn. Algorithms Quantum Comput., pp. 71–84, 2023, doi: 10.1515/9783110791402-004.
[23] M. Munir and D. Darmawan, “The Role of Trust, Ease of Use and Security on Shopping Interests at Lazada,” Eng. Technol. Int. J., vol. 4, no. 3, pp. 81–90, 2022.
[24] N. M. Gardazi, A. Daud, M. K. Malik, A. Bukhari, T. Alsahfi, and B. Alshemaimri, “BERT applications in natural language processing: a review,” Artif. Intell. Rev., vol. 58, no. 6, 2025, doi: 10.1007/s10462-025-11162-5.
[25] A. Pramudita, A. F. Nugroho, and T. B. Adji, “Aspect-based sentiment analysis for Indonesian hotel reviews using multilingual BERT,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 23, no. 1, pp. 456–464, Jan. 2026, doi: 10.11591/ijeecs.v23.i1.pp456-464.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Nabila Putri

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors retain copyright and full publishing rights to their articles. Upon acceptance, authors grant Indonesian Journal of Data and Science a non-exclusive license to publish the work and to identify itself as the original publisher.
Self-archiving. Authors may deposit the submitted version, accepted manuscript, and version of record in institutional or subject repositories, with citation to the published article and a link to the version of record on the journal website.
Commercial permissions. Uses intended for commercial advantage or monetary compensation are not permitted under CC BY-NC 4.0. For permissions, contact the editorial office at ijodas.journal@gmail.com.
Legacy notice. Some earlier PDFs may display “Copyright © [Journal Name]” or only a CC BY-NC logo without the full license text. To ensure clarity, the authors maintain copyright, and all articles are distributed under CC BY-NC 4.0. Where any discrepancy exists, this policy and the article landing-page license statement prevail.










