ENHANCING ASR ACCURACY AND COHERENCE ACROSS INDIAN LANGUAGES WITH WAV2VEC2 AND GPT-2

ICTACT Journal on Data Science and Machine Learning ( Volume: 6 , Issue: 2 )

Abstract

This paper presents a comprehensive framework for automatic speech recognition (ASR) and text refinement that leverages advanced deep learning models to improve transcription accuracy and contextual coherence across multiple languages, including Tamil, Kannada, Telugu, Malayalam, and English. The framework integrates three primary models: Wav2Vec2 for ASR, Sentence Transformer for semantic retrieval, and GPT-2 for text generation. Initially, the Wav2Vec2 model is employed to convert audio inputs into text, achieving a Word Error Rate (WER) of 8% and a Character Error Rate (CER) of 5%. This model is specifically trained on datasets from the aforementioned languages to ensure high performance across diverse linguistic contexts. Following this, the Sentence Transformer''''s paraphrase-multilingual-MiniLM-L12-v2 model processes the transcribed text to create vector representations, facilitating semantic similarity searches within a multilingual corpus. This step enables the retrieval of contextually relevant sentences to enhance the transcription. Finally, GPT-2 is utilized to refine the output, ensuring improved coherence and accuracy by correcting errors and filling in gaps. The overall performance of the system is evaluated using a BLEU score of 0.55, indicating substantial alignment with reference texts. The proposed methodology demonstrates the effectiveness of combining ASR, retrieval, and generative models in producing high-quality, coherent textual outputs from spoken language across multiple languages.

Authors

R. Geetha Rajakumari1, D. Karthika Renuka2, L. Ashok Kumar3
Sri Eshwar College of Engineering, India1, PSG College of Technology, India2, Thiagarajar College of Engineering, India3

Keywords

Retrieval-Augmented Generation (RAG), Transcription Accuracy, Phonetic and Syntactic Variations

Published By
ICTACT
Published In
ICTACT Journal on Data Science and Machine Learning
( Volume: 6 , Issue: 2 )
Date of Publication
March 2025
Pages
761 - 764
Page Views
288
Full Text Views
23

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in