FEATURE EXTRACTION USING I-VECTOR AND X-VECTOR METHODS FOR SPEAKER DIARIZATION
Abstract
Speaker diarization is the process of identifying who is speaking at different times in audio recordings. This is important in various situations, such as recording meetings, monitoring calls in call centers, or analyzing media. In this paper, examine how well different methods for speaker diarization perform in real-life scenarios. focus on two modern techniques: I-vectors and X-vectors. I-vectors are effective for automatic speaker recognition because they create compact and efficient representations of speakers using statistical models. However, they struggle in situations involving overlapping voices or background noise. On the other hand, X-vectors overcome these limitations. They use deep neural networks to create more complex and reliable representations, making them better suited for challenging conditions. To evaluate these two approaches, used standard datasets, specifically the AMI Meeting Corpus and VoxCeleb. measured their performance using two indicators: Diarization Error Rate (DER) and Jaccard Error Rate (JER). Results show that while I-vectors are less resource- intensive and work well in ideal conditions, X-vectors perform better in real-world settings where noise and overlapping speech are present. This study provides guidance for practitioners in choosing the right approach based on their needs, considering factors such as accuracy, computational costs, and reliability.

Authors
Vinod K. Pande1, Vijay K. Kale2, Sangramsing N. Kayte3
Dr G.Y. Pathrikar College of Computer Science and Information Technology, India1,2, University of Copenhagen, Denmark3

Keywords
Speaker Diarization, I-Vector, X-Vector, MFCC, Speech Recognition
Yearly Full Views
JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember
7140000000000
Published By :
ICTACT
Published In :
ICTACT Journal on Soft Computing
( Volume: 15 , Issue: 4 , Pages: 3717 - 3721 )
Date of Publication :
January 2025
Page Views :
125
Full Text Views :
21

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.