ICTACT Journals

MULTI-INPUT DEEP COMPLEX CONVOLUTION RECURRENT NETWORK BASED JOINT ACOUSTIC ECHO CANCELLATION AND BACKGROUND NOISE SUPPRESSION

ICTACT Journal on Communication Technology ( Volume: 17 , Issue: 1 )

Abstract

The most challenging problem of video conferencing systems is the degradation of sound quality due to various noise sources. Speech enhancement includes the reduction of background, acoustic echo cancellation, and dereverberation. A number of studies have been carried out to remove acoustic echo and background noise in video conferencing systems, and recently, DNN approaches have been applied to speech processing based on classical digital signal processing techniques, leading to great progress. We first propose a multi-input deep complex recurrent network (MIDCCRN) for noise suppression. Then, we propose a model for joint acoustic echo cancellation and background noise suppression in online voice communication systems, including video conferencing systems, using this network. The best performance of the proposed method is demonstrated by experiments with objective metrics including echo return loss enhancement (ERLE), signal-to-artifacts-ratio (SAR) and scale-invariant source-to-noise ratio (SI-SNR), mean opinion score (MOS) as a subjective metric, and AECMOS, real time factor (RTF), network size, and final score.

Authors

Kum-Song Pak¹, Chol-I Om², Kwon Kim³, Chol-Ui Ri⁴, Chol-Nam Om⁵
Kim Il Sung University, Democratic People’s Republic of Korea^1,3,4,5, University of Sciences, Democratic People’s Republic of Korea²

Keywords

Acoustic Echo Cancellation (AEC), Background Noise Suppression (BNS), Multi-Input Deep Complex Convolution Recurrent Network (MIDCCRN), Speech Enhancement (SE)

Published By

ICTACT

Published In

ICTACT Journal on Communication Technology
( Volume: 17 , Issue: 1 )

Date of Publication

March 2026

Pages

3834 - 3841

Doi

10.21917/ijct.2026.0566

Page Views

388

Full Text Views

View Issue

Article Details ICTACT Journals