Comparative Analysis of Speech-to-Text APIs for Supporting Communication of the Deaf Community
DOI:
https://doi.org/10.56705/ijodas.v6i3.327Keywords:
Speech-to-Text, API, Word Error Rate (WER), Word Per Minute (WPM), Deaf CommunityAbstract
Hearing impairment can have a profound impact on the mental and emotional state of sufferers, as well as hinder communication and delay in accessing information directly that relies on interpreters. Advances in assistive technology, especially speech recognition systems that are able to convert spoken language into written text (speech-to-text). However, its implementation faces various challenges related to the level of accuracy of each speech-to-text Application Programming Interface (API), thus requiring an appropriate deep learning model. This study serves to analyze and compare the performance of speech-to-text API services (Deepgram API, Google API and Whisper AI) based on Word Error Rate (WER) and Words Per Minute (WPM), to determine the most optimal API in a web-based real-time transcription system using the JavaScript programming language and Glitch.com. The three API services were tested by calculating their error rates and transcription speeds, then evaluated to see how low the error accuracy rate was and how high the transcription speed was. On average, Whisper AI had a WER of 0% across all word categories, but its speed was lower than the other two APIs. Deepgram API displayed the best balance between accuracy and speed, with an average WER of 13.78% and 67 WPM. Google API performed stably, but its WER value was slightly higher than Deepgram API. In conclusion, based on the results, Deepgram API was deemed the most optimal for live transcription, as it is capable of producing fast and error-free transcriptions, significantly increasing the accessibility of information for the deaf community.
Downloads
References
[2] D. M. P. Jayakody et al., “Is There an Association Between Untreated Hearing Loss and Psychosocial Outcomes,” Front Aging Neurosci, vol. 14, May 2022, doi: 10.3389/fnagi.2022.868673.
[3] P. Patel, S. Pampaniya, A. Ghosh, R. Raj, D. Karuppaih, and S. Kandasamy, “Enhancing Accessibility Through Machine Learning: A Review on Visual and Hearing Impairment Technologies,” IEEE Access, vol. 13, pp. 33286–33307, 2025, doi: 10.1109/ACCESS.2025.3539081.
[4] P. A. Rodríguez-Correa, A. Valencia-Arias, O. N. Patiño-Toro, Y. Oblitas Díaz, and R. Teodori De la Puente, “Benefits and development of assistive technologies for Deaf people’s communication: A systematic review,” Front Educ (Lausanne), vol. 8, Apr. 2023, doi: 10.3389/feduc.2023.1121597.
[5] L. A. Kumar, D. K. Renuka, S. L. Rose, M. C. Shunmuga priya, and I. M. Wartana, “Deep learning based assistive technology on audio visual speech recognition for hearing impaired,” International Journal of Cognitive Computing in Engineering, vol. 3, pp. 24–30, Jun. 2022, doi: 10.1016/j.ijcce.2022.01.003.
[6] P. A. Rodríguez-Correa, A. Valencia-Arias, O. N. Patiño-Toro, Y. Oblitas Díaz, and R. Teodori De la Puente, “Benefits and development of assistive technologies for Deaf people’s communication: A systematic review,” Front Educ (Lausanne), vol. 8, Apr. 2023, doi: 10.3389/feduc.2023.1121597.
[7] A. Apriyanto, A. Intes, S. Rachmawati, S. Hanim, and A. K. Alhamdani, “Supporting Inclusivity Through an Automatic Transcription Application to Improve Hearing Skills for the Deaf,” Journal International of Lingua and Technology, vol. 3, no. 2, pp. 425–440, Aug. 2024, doi: 10.55849/jiltech.v3i2.672.
[8] K. Kuhn, V. Kersken, and G. Zimmermann, “Communication Access Real-Time Translation Through Collaborative Correction of Automatic Speech Recognition,” in Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, New York, NY, USA: ACM, Apr. 2025, pp. 1–8. doi: 10.1145/3706599.3719984.
[9] R. Whetten, M. T. Imtiaz, and C. Kennington, “Evaluating Automatic Speech Recognition in an Incremental Setting,” Feb. 2023 http://arxiv.org/abs/2302.12049
[10] A. A. H. Al-Shateri, R. A. Rashid, A. Aburaya, and N. A. Muhammad, “Luna: A Benchmark Project in the Convergence of Artificial Intelligence and Internet of Things for Home Automation,” in 2024 IEEE International Conference on Advanced Telecommunication and Networking Technologies (ATNT), IEEE, Sep. 2024, pp. 1–4. doi: 10.1109/ATNT61688.2024.10719226.
[11] G. Arop, “Integration Of A Speech Recognition System Into Fulafia FMIS,” SSRN Electronic Journal, 2025, doi: 10.2139/ssrn.5096981.
[12] K. Hux, J. A. Brown, S. Wallace, K. Knollman-Porter, A. Saylor, and E. Lapp, “Effect of Text-to-Speech Rate on Reading Comprehension by Adults With Aphasia,” Am J Speech Lang Pathol, vol. 29, no. 1, pp. 168–184, Feb. 2020, doi: 10.1044/2019_AJSLP-19-00047.
[13] R. Yakubovskyi and Y. Morozov, “Speech Models Training Technologies Comparison Using Word Error Rate,” Advances in Cyber-Physical Systems, vol. 8, no. 1, pp. 74–80, May 2023, doi: 10.23939/acps2023.01.074.
[14] L. N. Yeganeh, N. S. Fenty, Y. Chen, A. Simpson, and M. Hatami, “The Future of Education: A Multi-Layered Metaverse Classroom Model for Immersive and Inclusive Learning,” Future Internet, vol. 17, no. 2, p. 63, Feb. 2025, doi: 10.3390/fi17020063.
[15] M. Telmem, N. Laaidi, and H. Satori, “The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system,” Int J Speech Technol, vol. 28, no. 1, pp. 299–312, Mar. 2025, doi: 10.1007/s10772-025-10183-3.
[16] Y. O. Sharrab, H. Attar, M. A. H. Eljinini, Y. Al-Omary, and W. E. Al-Momani, “Advancements in Speech Recognition: A Systematic Review of Deep Learning Transformer Models, Trends, Innovations, and Future Directions,” IEEE Access, vol. 13, pp. 46925–46940, 2025, doi: 10.1109/ACCESS.2025.3550855.
[17] A. B. P. Utama, A. P. Wibawa, A. N. Handayani, and M. Y. Chuttur, “Exploring the Role of Deep Learning in Forecasting for Sustainable Development Goals: A Systematic Literature Review,” International Journal of Robotics and Control Systems, vol. 4, no. 1, pp. 365–400, Mar. 2024, doi: 10.31763/ijrcs.v4i1.1328.
[18] A. B. P. Utama, A. P. Wibawa, A. N. Handayani, W. S. G. Irianto, Aripriharta, and A. Nyoto, “Improving Time-Series Forecasting Performance Using Imputation Techniques in Deep Learning,” in 2024 International Conference on Smart Computing, IoT and Machine Learning (SIML), IEEE, Jun. 2024, pp. 232–238. doi: 10.1109/SIML61815.2024.10578273.
[19] A. Karibayeva, V. Karyukin, B. Abduali, and D. Amirova, “Speech Recognition and Synthesis Models and Platforms for the Kazakh Language,” Jul. 28, 2025. doi: 10.20944/preprints 202507.2282.v1.
[20] K. Ko, S. Kim, and H. Kwon, “Selective Audio Perturbations for Targeting Specific Phrases in Speech Recognition Systems,” International Journal of Computational Intelligence Systems, vol. 18, no. 1, p. 103, May 2025, doi: 10.1007/s44196-025-00844-1.
[21] E. Pusateri et al., “Retrieval Augmented Correction of Named Entity Speech Recognition Errors,” in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Apr. 2025, pp. 1–5. doi: 10.1109/ICASSP49660.2025.10888936.
[22] S. D. Stavisky, “Restoring Speech Using Brain–Computer Interfaces,” Annu Rev Biomed Eng, vol. 27, no. 1, pp. 29–54, May 2025, doi: 10.1146/annurev-bioeng-110122-012818.
[23] M. L. Rethlefsen et al., “PRISMA-S: an extension to the PRISMA statement for reporting literature searches in systematic reviews,” Journal of the Medical Library Association, vol. 109, no. 2, Jul. 2021, doi: 10.5195/jmla.2021.962.
[24] B. Hammer, E. Virgili, and F. Bilotta, “Evidence-based literature review: De-duplication a cornerstone for quality,” World J Methodol, vol. 13, no. 5, pp. 390–398, Dec. 2023, doi: 10.5662/wjm.v13.i5.390.
[25] A.-L. Georgescu, A. Pappalardo, H. Cucu, and M. Blott, “Performance vs. hardware requirements in state-of-the-art automatic speech recognition,” EURASIP J Audio Speech Music Process, vol. 2021, no. 1, p. 28, Dec. 2021, doi: 10.1186/s13636-021-00217-4.
[26] A. Ferraro, A. Galli, V. La Gatta, and M. Postiglione, “Benchmarking open source and paid services for speech to text: an analysis of quality and input variety,” Front Big Data, vol. 6, Sep. 2023, doi: 10.3389/fdata.2023.1210559.
[27] E. Kumalija and Y. Nakamoto, “Performance evaluation of automatic speech recognition systems on integrated noise-network distorted speech,” Frontiers in Signal Processing, vol. 2, Sep. 2022, doi: 10.3389/frsip.2022.999457.
[28] T. Olatunji et al., “A multi-country study comparing typed to automatic speech recognition-based medical documentation speeds among Low- and Middle-Income Country Trained Clinicians,” May 13, 2025. doi: 10.1101/2025.05.11.25327386.
[29] L. Yang et al., “Quality Assessment in Systematic Literature Reviews: A Software Engineering Perspective,” Inf Softw Technol, vol. 130, p. 106397, Feb. 2021, doi: 10.1016/j.infsof.2020.106397.
[30] S. Vegas, P. Riofrio, E. Marcos, and N. Juristo, “On (Mis)perceptions of testing effectiveness: an empirical study,” Feb. 2024, doi: 10.1007/s10664-020-09805-y.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Anik Nur Handayani, Hariyono Hariyono, Ahmad Munjin Nasih, Rochmawati Rochmawati, Imanuel Hitipeuw, Harits Ar Rosyid, Jevri Tri Ardiansah, Rafli Indar Praja, Ahmad Nurdiansyah, Desi Fatkhi Azizah

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors retain copyright and full publishing rights to their articles. Upon acceptance, authors grant Indonesian Journal of Data and Science a non-exclusive license to publish the work and to identify itself as the original publisher.
Self-archiving. Authors may deposit the submitted version, accepted manuscript, and version of record in institutional or subject repositories, with citation to the published article and a link to the version of record on the journal website.
Commercial permissions. Uses intended for commercial advantage or monetary compensation are not permitted under CC BY-NC 4.0. For permissions, contact the editorial office at ijodas.journal@gmail.com.
Legacy notice. Some earlier PDFs may display “Copyright © [Journal Name]” or only a CC BY-NC logo without the full license text. To ensure clarity, the authors maintain copyright, and all articles are distributed under CC BY-NC 4.0. Where any discrepancy exists, this policy and the article landing-page license statement prevail.










