Year

2011

Degree Name

Doctor of Philosophy

Department

School of Electrical, Computer and Telecommunications Engineering

Abstract

Capturing speech signals for enhancement is an important stage in all modern communication systems. Traditionally, speech enhancement is performed on a single channel recording, but recently the advantages of multichannel speech processing have been indentified. The multichannel speech signals are captured using a microphone array, and by using the spatio-temporal information at the output of the microphone array the directional information of the source can be derived and spatial filtering of the captured signal can be performed, which show superior performance over single channel approaches. Generally, spatially distributed microphone arrays as used in speech signal processing, only capture the acoustic pressure. In this thesis, however, a co-located microphone array which captures both acoustic pressure and particle velocity, known as an Acoustic Vector Sensor (AVS), will be used for capturing speech signals for enhancement.

The AVS used in this work consists of two pressure gradient sensors and an omni-directional microphone which enables the capturing of speech of signals in 2D. Compared with other microphone arrays, the size of the AVS array is small, occupying a volume of approximately 1cm3. The small size of the AVS array enables it be used in mobile electronic devices such as mobile phones and mobile personal computers which traditionally have a single microphone capsule.

In this thesis, a design change for the AVS is presented, which, improves the accuracy of Direction of Arrival (DOA) estimates from the AVS. It is shown that by offsetting the directional sensors on the AVS array, a source direction can be identified with an accuracy of two degrees for a stationery speech source and five degrees for both moving and multiple speech sources. Here, DOA estimates are found using the MUltiple SIgnal Classification (MUSIC) Algorithm in the time domain and an intensity based algorithm in the frequency domain. For multiple sources, a new data clustering technique is introduced with the existing frequency domain intensity based algorithm.

Speech enhancement methods, which take advantage of the directional characteristics of the AVS array are presented. It is shown that by taking advantage of the directional characteristics of the AVS to obtain noise estimates used in the Minimum Variance Distortionless Response (MVDR) beamformer, an improvement of 1.34 Mean Opinion Score (MOS) was achieved over the conventional MVDR beamformer. Here, the noise covariance matrix is obtained by a new technique which uses Singular Value Decomposition (SVD) of the AVS array outputs. Furthermore, it is shown that by applying the Griffiths and Jim (GJ) beamformer to the AVS output channels, a MOS of 1.74 over unprocessed noise corrupted speech signals was achieved in listening tests.

A new technique for speech enhancement which combines Linear Predictive (LP) spectrum-based perceptual filtering to the recordings obtained from an AVS is presented. The technique takes advantage of the directional polar responses of the AVS to obtain a significantly more accurate representation of the LP spectrum of a target speech signal in the presence of noise when compared to single channel, omni-directional recordings. Listening tests results show significant improvements in MOS scores of 1.6 over unprocessed noise corrupted speech. Further improvements to the proposed LP spectrum based perceptual filtering are achieved by introducing the averaged autocorrelation function to obtain a multichannel LP spectrum from the directional components of the AVS array. By introducing the average autocorrelation function a MOS of 1.98 over unprocessed noise corrupted speech signals is achieved.

In addition to the perceptual filter, two Blind Source Separation (BSS) algorithms are presented. The well known Independent Component Analysis (ICA) and a new method based on the clustering of DOA estimates performed on a time frequency basis are presented. Comparisons are made between co-located microphone arrays that contain microphones with mixed polar responses and traditional Uniform Linear Arrays (ULA) formed from omni-directional microphones and Soundfield microphones. It is shown that polar responses of the microphones are a key factor in the performance of ICA applied to co-located microphones. It is shown by applying the two BSS algorithms, improvements of 1.75 and 2.09 MOS over unprocessed noise corrupted speech signals are achieved for ICA and DOA based methods respectively, during listening tests.

Finally, the DOA estimation and clustering method for BSS is used for dereverberation of speech signals. It is shown that by using the directional characteristics of the AVS array, reflections from different directions can be minimized. The results show that an improvement in terms of Signal to Reverberant Ratio (SRR) of 1.5 dB and 2.5 dB for a source at 1m and 5m from the AVS array respectively is achieved.

Share

COinS