Master of Philosophy
School of Electrical, Computer and Telecommunications Engineering
The problem of separating mixtures of speech signals has always been a heated topic in speech processing. Multiple speech separation approaches have been proposed and a successful separation system benefits numerous applications, such as hands-free communication systems. However, separation performance of existing techniques is still unsatisfactory in terms of both speech quality and speech intelligibility. Recently, data driven approaches to solving speech signal processing problems, where information learnt from example databases of speech recordings is used to derive new signal processing algorithms has shown significant success. Consequently, this thesis investigates one of the data-driven models for speech separation, namely non-negative matrix factorization (NMF) and relevant methods, with the expectation of achieving increased speech quality and speech intelligibility of separated speech sources compared to existing approaches. Specifically, Chapter 3 proposes an NMF approach modified with spectral magnitude masks typically derived for single-channel speech separation. Chapter 4 then proposes an enhanced NMF approach that utilises estimated direction-of-arrival information to realize multi-channel speech separation. Compared with corresponding baseline methods, the proposed approaches demonstrate improvements in speech quality and intelligibility metrics, which verifies the success of the proposed approaches in this thesis.
Feng, Yuxiao, Single and Multichannel Speech Source Separation using Non- Negative Matrix Factorisation Incorporating Spectral Masks, Master of Philosophy thesis, School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2017. https://ro.uow.edu.au/theses1/90