MULTI-INPUT DEEP COMPLEX CONVOLUTION RECURRENT NETWORK BASED JOINT ACOUSTIC ECHO CANCELLATION AND BACKGROUND NOISE SUPPRESSION

ICTACT Journal on Communication Technology ( Volume: 17 , Issue: 1 )

Abstract

The most challenging problem of video conferencing systems is the degradation of sound quality due to various noise sources. Speech enhancement includes the reduction of background, acoustic echo cancellation, and dereverberation. A number of studies have been carried out to remove acoustic echo and background noise in video conferencing systems, and recently, DNN approaches have been applied to speech processing based on classical digital signal processing techniques, leading to great progress. We first propose a multi-input deep complex recurrent network (MIDCCRN) for noise suppression. Then, we propose a model for joint acoustic echo cancellation and background noise suppression in online voice communication systems, including video conferencing systems, using this network. The best performance of the proposed method is demonstrated by experiments with objective metrics including echo return loss enhancement (ERLE), signal-to-artifacts-ratio (SAR) and scale-invariant source-to-noise ratio (SI-SNR), mean opinion score (MOS) as a subjective metric, and AECMOS, real time factor (RTF), network size, and final score.

Authors

Kum-Song Pak1, Chol-I Om2, Kwon Kim3, Chol-Ui Ri4, Chol-Nam Om5
Kim Il Sung University, Democratic People’s Republic of Korea1,3,4,5, University of Sciences, Democratic People’s Republic of Korea2

Keywords

Acoustic Echo Cancellation (AEC), Background Noise Suppression (BNS), Multi-Input Deep Complex Convolution Recurrent Network (MIDCCRN), Speech Enhancement (SE)

Published By
ICTACT
Published In
ICTACT Journal on Communication Technology
( Volume: 17 , Issue: 1 )
Date of Publication
March 2026
Pages
3834 - 3841
Page Views
7
Full Text Views
1