Doctor of Philosophy
School of Electrical, Computer and Telecommunications Engineering
Ad-hoc microphone arrays formed from the microphones of mobile devices such as smart phones, tablets and notebooks are emerging recording platforms for meetings, press conferences and other sound scenes. As opposed to the Wireless Acoustic Sensor Networks (WASN), ad-hoc microphones do not communicate within the array and location of each microphone is unknown. Analysing speech signals and the acoustic scene in the context of ad-hoc microphones is the goal of this thesis. Despite conventional known geometry microphone arrays (e.g. a Uniform Linear array), ad-hoc arrays do not have fixed geometries and structures and therefore standard speech processing techniques such as beamforming and dereverbearion techniques cannot be directly applied to these. The main reasons for this include unknown distances between microphones an hence unknown relative time delays and the changeable array topology.
This thesis focuses on utilising the side information obtained by the acoustic scene analysis to improve the speech enhancement by ad-hoc microphone arrays randomly distributed within a reverberant environment. New discriminative features are proposed, applied and tested for various signal and audio processing applications such as microphone clustering, source localisation, multi-channel dereverberation, source counting and multi-talk detection. The main contributions of this thesis fall into two categories: 1) Novel spatial features extracted from Room Impulse Responses (RIRs) and speech signals 2) Speech enhancement and acoustic scene analysis methods specifically designed for the ad-hoc arrays.
Microphone clustering, source localisation, speech enhancement, source counting and multi-talk detection in the context of ad-hoc arrays are investigated in this thesis and novel methods are proposed and tested. A clustered speech enhancement and dereverberation method tailored for the ad-hoc microphones is proposed and it is concluded that exclusively using a cluster of microphones located closer to the source, improves the dereverberation performance. Also proposed is a multi-channel speech dereverberation method based on a novel spatial multi-channel linear prediction analysis approach for the ad-hoc microphones. The spatially modified multi-channel linear prediction approach takes into account the estimated relative distances between the source and the microphones and improves the dereverberation performance. The coherence based features are applied for multi-talk detection and source counting in highly reverberant environments and it is shown that the proposed features are reliable source counting features in the context of ad-hoc microphones. Highly accurate offline source counting and pseudo real-time multi-talk detection results are achieved by the proposed methods.
Pasha, Shahab, Analysis and Enhancement of Spatial Sound Scenes Recorded using Ad-Hoc Microphone Arrays, Doctor of Philosophy thesis, School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2017. https://ro.uow.edu.au/theses1/450