This paper presents a comprehensive framework for automatic speech
recognition (ASR) and text refinement that leverages advanced deep
learning models to improve transcription accuracy and contextual
coherence across multiple languages, including Tamil, Kannada,
Telugu, Malayalam, and English. The framework integrates three
primary models: Wav2Vec2 for ASR, Sentence Transformer for
semantic retrieval, and GPT-2 for text generation. Initially, the
Wav2Vec2 model is employed to convert audio inputs into text,
achieving a Word Error Rate (WER) of 8% and a Character Error Rate
(CER) of 5%. This model is specifically trained on datasets from the
aforementioned languages to ensure high performance across diverse
linguistic contexts. Following this, the Sentence Transformer's
paraphrase-multilingual-MiniLM-L12-v2 model processes the
transcribed text to create vector representations, facilitating semantic
similarity searches within a multilingual corpus. This step enables the
retrieval of contextually relevant sentences to enhance the
transcription. Finally, GPT-2 is utilized to refine the output, ensuring
improved coherence and accuracy by correcting errors and filling in
gaps. The overall performance of the system is evaluated using a BLEU
score of 0.55, indicating substantial alignment with reference texts. The
proposed methodology demonstrates the effectiveness of combining
ASR, retrieval, and generative models in producing high-quality,
coherent textual outputs from spoken language across multiple
languages.
R. Geetha Rajakumari1, D. Karthika Renuka2, L. Ashok Kumar3 Sri Eshwar College of Engineering, India1, PSG College of Technology, India2, Thiagarajar College of Engineering, India3
Retrieval-Augmented Generation (RAG), Transcription Accuracy, Phonetic and Syntactic Variations
January | February | March | April | May | June | July | August | September | October | November | December |
0 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Published By : ICTACT
Published In :
ICTACT Journal on Data Science and Machine Learning ( Volume: 6 , Issue: 2 , Pages: 761 - 764 )
Date of Publication :
March 2025
Page Views :
30
Full Text Views :
5
|