Year

2002

Degree Name

Doctor of Philosophy

Department

Faculty of Informatics

Abstract

With the digitisation of most communication channels and an ever-increasing demand for mobile communication services, the amount of traffic generated by coded speech signals continues to grow rapidly. To accommodate this increased traffic load in the finite bandwidth available for speech communication, it is necessary to develop speech compression algorithms that can dynamically scale to traffic and user demands. These scalable compression algorithms should be capable of dynamically altering the bit rate required for transmission, whist smoothly and gradually varying the synthesized speech subjective quality with the changes in bit rate. To further increase the throughput of the communication channel, the scalable algorithm should operate in the lower range of bit rates currently used for speech compression (i.e. 2 to 8 kbps).

This thesis proposes a number of scalable speech coding techniques that lead to the development of a single coding algorithm that is capable of scalable operation. Firstly, the characteristics of existing speech compression algorithms that limit scalable operation between bit rates of 2 and 8 kbps are identified. The major limiting characteristics are identified as; 1) the existence of a distinct barrier at 4 kbps below which parametric coders dominate and above which waveform coders dominate; 2) the large delay requirements for current low rate coding algorithms.

A method that exploits the simultaneous masking property of the human ear in a linear predictive filter is proposed. The proposed method modifies the linear predictive filter to remove more of the perceptually important information from the input signal than a standard linear predictive filter. This characteristic is shown to improve the subjective speech quality of low-rate linear prediction based speech coders.

To enable the pitch cycle redundancies of the speech signal to be exploited in the coding algorithm, without introducing excessive algorithmic delay, a novel low delay method for segmenting the speech into non-overlapped pitch length subframes is proposed. This method requires only a single frame of speech and locates the pitch pulses by selecting the pulse locations in a closed loop function. The proposed segmentation is shown to produce a much more accurate pitch track in transient sections of the speech signal than the pitch track produced by traditional autocorrelation based pitch detectors.

A number of Low delay decomposition techniques are proposed which decompose the speech into perceptually different components and allow scalable reconstruction of the speech signal. The preferred technique performs the decomposition in a closed loop function allowing quantisation errors to be accounted for in the decomposition process. The proposed scalable techniques are combined to produce a scalable algorithm that operates at a range of bit rates from 2 to 8 kbps. The synthesized speech quality produced by the scalable algorithm varies smoothly as the operating bit rate is varied. A key feature of the proposed algorithm is the ability to migrate from a time asynchronous parametric coder at low rates, to a time synchronous waveform coder at higher bit rates. The coder also requires only a single frame of algorithmic delay (30 ms) for operation. Subjective results presented indicate that the scalable coder produces subjective speech quality that is comparable with that achieved for fixed rate standardized coders at each of the tested bit rates.

Share

COinS
 

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.