ENHANCING ASR ACCURACY AND COHERENCE ACROSS INDIAN LANGUAGES WITH WAV2VEC2 AND GPT-2
Abstract
This paper presents a comprehensive framework for automatic speech recognition (ASR) and text refinement that leverages advanced deep learning models to improve transcription accuracy and contextual coherence across multiple languages, including Tamil, Kannada, Telugu, Malayalam, and English. The framework integrates three primary models: Wav2Vec2 for ASR, Sentence Transformer for semantic retrieval, and GPT-2 for text generation. Initially, the Wav2Vec2 model is employed to convert audio inputs into text, achieving a Word Error Rate (WER) of 8% and a Character Error Rate (CER) of 5%. This model is specifically trained on datasets from the aforementioned languages to ensure high performance across diverse linguistic contexts. Following this, the Sentence Transformer's paraphrase-multilingual-MiniLM-L12-v2 model processes the transcribed text to create vector representations, facilitating semantic similarity searches within a multilingual corpus. This step enables the retrieval of contextually relevant sentences to enhance the transcription. Finally, GPT-2 is utilized to refine the output, ensuring improved coherence and accuracy by correcting errors and filling in gaps. The overall performance of the system is evaluated using a BLEU score of 0.55, indicating substantial alignment with reference texts. The proposed methodology demonstrates the effectiveness of combining ASR, retrieval, and generative models in producing high-quality, coherent textual outputs from spoken language across multiple languages.

Authors
R. Geetha Rajakumari1, D. Karthika Renuka2, L. Ashok Kumar3
Sri Eshwar College of Engineering, India1, PSG College of Technology, India2, Thiagarajar College of Engineering, India3

Keywords
Retrieval-Augmented Generation (RAG), Transcription Accuracy, Phonetic and Syntactic Variations
Yearly Full Views
JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember
000500000000
Published By :
ICTACT
Published In :
ICTACT Journal on Data Science and Machine Learning
( Volume: 6 , Issue: 2 , Pages: 761 - 764 )
Date of Publication :
March 2025
Page Views :
30
Full Text Views :
5

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.