Year

2022

Degree Name

Doctor of Philosophy

Department

School of Electrical, Computer and Telecommunications Engineering

Abstract

A clear recording of speech is increasingly crucial for human-machine interaction with the swift development of modern electronic and smart devices employing artificial intelligence (AI) techniques, such as robots, autonomous vehicles and smart home assistants. Moreover, the outbreak of the unprecedented coronavirus disease (COVID) leads to a growing demand of remote applications in nearly all industries, including teleconferencing, remote teaching, telemedicine, hands-free telephony, mobile apps, etc. All these machines and applications cannot be successful without clear speech recordings. However, there is always noise, echoes and other interfering sounds in the real world, and the recording and processing of the target speech is highly challenging. Therefore, a high-quality, high-efficiency and low-cost speech recording and processing method with robustness to adverse environments is urgently needed.

Existing recording approaches use an array of microphones that are uniformly spaced in devices for obtaining a desired output with emphasis of the sound from target directions and suppression of the sound from all other directions. Jointly processing the multiple microphone recordings to form a single output is termed beamforming and can bring much clearer recordings compared to a single microphone with the presence of noise, reverberation and simultaneous interfering sounds. However, the speech is time-variant and broadband, which includes signals at a wide range of frequencies with all of them varying quickly. Thus, the recording and processing of speech signals prefer microphone arrays that can tackle those challenges, but such capabilities of traditional uniform microphone arrays (UMAs) are limited.

FoR codes (2008)

020301 Acoustics and Acoustical Devices; Waves, 090609 Signal Processing, 080199 Artificial Intelligence and Image Processing not elsewhere classified

Share

COinS
 

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.