An effective target speech enhancement with single acoustic vector sensor based on the speech time-frequency sparsity

RIS ID

102940

Publication Details

Y. x. Zou, Y. Q. Wang, P. Wang, C. H. Ritz & J. Xi, "An effective target speech enhancement with single acoustic vector sensor based on the speech time-frequency sparsity," in Digital Signal Processing (DSP), 2014 19th International Conference on, 2014, pp. 547-551.

Abstract

This paper investigates the speech time-frequency (TF) sparsity together with the unique characteristics between the acoustic vector sensors (AVS) to formulate an effective speech enhancement approach under the minimum mean square error (MMSE) criterion together with a fixed beamformer (FBF). The proposed approach exploits the inter-sensor data ratio (ISDR) of the AVS and time-frequency sparsity of speech to derive a mask that is used to extract and enhance a target speech signal recorded in the presence of a spatially separated interfering speech signal and background noise. Experimental results show that the proposed AVS-ISDRSS algorithm effectively suppresses the spatial interference and additive background noise meanwhile increases the perceptual quality of the target speech. In addition, it is noted that the proposed AVS-ISDRSS algorithm does not require voice activity detection (VAD) for estimating the speech and this greatly reduces the computational complexity.

Please refer to publisher version or contact your library.

Share

COinS
 

Link to publisher version (DOI)

http://dx.doi.org/10.1109/ICDSP.2014.6900725