Spectral mask estimation using deep neural networks for inter-sensor data ratio model based robust DOA estimation
Accurate DOA estimation based on clustering the inter-sensor data ratios (ISDRs) of a single acoustic vector sensor (AVS), referred as AVS-ISDR, relies on reliable extraction of time-frequency points with high local signal-to-noise ratio (HLSNR-TFPs) and its performance degrades in noisy environments. This paper investigates deep neural networks (DNNs) trained with noisy-clean speech pairs under different SNR levels and noise types to improve the performance of AVS-ISDR in noise conditions. The DNNs is trained to learn characteristics reflecting the level of speech information at different TFPs, which helps to generate a reliable spectral mask for obtaining a noise-reduced spectral. Correspondingly, a robust DOA estimation algorithm named as AVS-DNN-ISDR has been developed. Experimental results verify the proposed DNN-based spectral mask improves the reliable HLSNR-TFPs extraction at different SNR levels. Results from simulations and real AVS recordings further validate AVS-DNN-ISDR achieving high DOA estimation accuracy even when the SNR is lower than 0dB.