ICTACT Journals

REAL-TIME JOINT ACOUSTIC ECHO CANCELLATION AND PERSONALIZED SPEECH ENHANCEMENT BASED ON CROSS-ATTENTION ALIGNMENT AND X- VECTOR

ICTACT Journal on Communication Technology ( Volume: 17 , Issue: 1 )

Abstract

Personalized speech enhancement (PSE) is a speech enhancement method to remove interfering speech, background noise, and reverberation based on a speaker embedding extracted from the target speaker such as d-vector and x-vector. In full duplex communication scenarios, when the microphone and far-end signal are coexisted together, it creates acoustic echoes. This echo is one of the major factors to the degradation of the sound quality of online communication systems, including video conferencing. Hence, Acoustic Echo Cancellation (AEC), a technique that can effectively remove these acoustic echoes, has been investigated. For full-duplex communications, which acoustic echoes are exist with background noises and interfering speech together, AEC and PSE must be combined. We study this combination. Our goal is to develop a causal model that can be applied to various model architectures to efficiently handle the tasks of AEC, PSE, and joint AEC-PSE. The features are extracted from the far-end signal and the near-end signal. The cross- attention alignment mechanism is used for feature alignment of the far-end signal and x-vectors are used as speaker embedding features. The proposed method is applied to PSE models such as E3Net and VoiceFilter-Lite. We present extensive experimental results. We demonstrate the effectiveness of the proposed method through the experiments in terms of various evaluation metrics with several standard audio and real recording datasets.

Authors

Kwon Kim, Yong-Hun Yun, Chol-Nam Om
Kim Il Sung University, Democratic People’s Republic of Korea

Keywords

Personalized Speech Enhancement, Acoustic Echo Cancellation, Cross-Attention Alignment, X-Vector

Published By

ICTACT

Published In

ICTACT Journal on Communication Technology
( Volume: 17 , Issue: 1 )

Date of Publication

March 2026

Pages

3801 - 3809

Doi

10.21917/ijct.2026.0562

Page Views

403

Full Text Views

View Issue

Article Details ICTACT Journals