An effective target speech enhancement with single acoustic vector sensor based on the speech time-frequency sparsity
This paper investigates the speech time-frequency (TF) sparsity together with the unique characteristics between the acoustic vector sensors (AVS) to formulate an effective speech enhancement approach under the minimum mean square error (MMSE) criterion together with a fixed beamformer (FBF). The proposed approach exploits the inter-sensor data ratio (ISDR) of the AVS and time-frequency sparsity of speech to derive a mask that is used to extract and enhance a target speech signal recorded in the presence of a spatially separated interfering speech signal and background noise. Experimental results show that the proposed AVS-ISDRSS algorithm effectively suppresses the spatial interference and additive background noise meanwhile increases the perceptual quality of the target speech. In addition, it is noted that the proposed AVS-ISDRSS algorithm does not require voice activity detection (VAD) for estimating the speech and this greatly reduces the computational complexity.