Doctor of Philosophy
School of Electrical, Computer and Telecommunications Engineering
Ritz, Christian, Decomposition and interpolation techniques for very low bit rate wideband speech coding, Doctor of Philosophy thesis, School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2003. http://ro.uow.edu.au/theses/1945
Applications that require the transmission or storage of speech, such as mobile telephony, operate in limited bandwidth environments. Such applications rely on efficient speech coding algorithms to best utilise the available bandwidth. Most existing research has focused on coding of narrowband speech signals, which are typically sampled at 8 kHz. Such research has led to many low bit rate solutions. Recent and emerging applications including video telephony and Internet telephony desire speech signals that have a much higher quality than narrowband speech signals. Such quality can be obtained using wideband speech, which is sampled at twice the rate (16 kHz) compared with narrowband speech.
Existing research into wideband speech coding has mainly focused on bit rates around 16 kbps and above. This thesis presents new research into wideband speech coding at the much lower rates of 6 kbps and below. It is envisaged that wideband speech coding at these rates will become increasingly important as the bandwidth available for the applications described above decreases.
The quantisation of the speech spectral envelope, represented by the parameters derived from Linear Prediction (LP), consumes a large proportion of the total bit rate of a speech coder. As a precursor to much that is achieved in this thesis, a detailed investigation into an LP quantisation technique used in narrowband speech coding called Temporal Decomposition (TD) is performed. Subsequently, techniques that optimise the algorithm for wideband speech are described. A new technique called split TD is described and applied to wideband speech. Results for the quantisation of wideband LP parameters through the application of TD exhibit significant reductions in the bit rate required to accurately represent the L P spectral parameters compared with existing alternative approaches.
The remainder of this thesis is devoted to a detailed investigation into the analysis and decomposition requirements of Waveform Interpolation (WI) applied to wideband speech coding. WI is a low bit rate speech coding technique that has previously only been applied to narrowband speech. An investigation into the LP requirements for wideband speech reveals that the order of the analysis should be double over that used in narrowband WI to obtain a residual that is appropriate for wideband WI. An investigation into the properties of the Characteristic Waveforms (CWs) derived for wideband speech indicates that only the low frequencies of the CWs benefit from decomposition, and as a consequence, a frequency dependent low pass filtering technique is proposed. For the quantisation of the high frequencies, a modulated noise model is proposed. Also investigated is a new decomposition technique based on a statistical analysis of the CWs. This technique separates the CWs into statistically distinct parameters.
Finally, this thesis describes results for wideband WI coders operating at 2.5 kbps, 4 kbps and 6 kbps. Through extensive subjective listening tests it is shown that WI is a promising approach to wideband speech coding at rates around 4 kbps.