Doctor of Philosophy
University of Wollongong. School of Electrical, Computer and Telecommunications Engineering
Cheng, Bin, Spatial squeezing techniques for low bit-rate multichannel audio coding, Doctor of Philosophy thesis, University of Wollongong. School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2011. https://ro.uow.edu.au/theses/3243
In recent years, significant research has been focused on efficient compression and representation of multichannel spatial audio signals. Recent developments in this area exploit spatial audio cues representing inter-channel mathematical relationships.The original multichannel signal is down mixed to a backward compatible mono/stereo signal, while the spatial cues are utilized for recovering the surround sound. It is shown that, these approaches provide efficient coding of multichannel spatial audio signals, in terms of both bit-rate reduction and perceptual quality. However, drawback scan be found in these approaches as the spatial cues do not represent perceptually relevant information, which can result in inefficient quantisation, as well as perceptual distortion of the localisation characteristics of the sound field. Furthermore,in these approaches, as the downmixing and spatial cue derivation algorithm is specifically designed to suit a certain multichannel audio format, the flexibility and extensibility for coding future multichannel audio formats is limited.
The Spatially Squeezing Surround Audio Coding (S3AC) is presented in this thesis as an alternative efficient solution for the representation of spatial audio signals. Based on estimating sound source and localisation information in the spatial sound field, the fundamental idea in S3AC is to represent a surround soundfield with a‘squeezed’ soundfield by exploiting perceptual localisation irrelevancy. In particular,it is shown that, while limited perceptual precision is required for representing localisation information of a surround soundfield without perceptual distortion, the localisation precision computationally derivable for a small soundfield is adequate to save the perceptual localisation information of a surround soundfield. Thus, a multichannel spatial audio signal rendering a surround soundfield can be representedby a small soundfield rendered with less channels, while additional spatial cues are not required. A typical S3AC application is then introduced, where a 5.1-channel surround audio signal is efficiently represented by a stereo downmix signal, which renders a ‘squeezed’ version of the original surround soundfield. This stereo signal is backward compatible to a conventional audio system, but can also be exploited to recover the original surround soundfield.
The proposed S3AC approach is then further analyzed. The localisation resolution inthe S3AC squeezed soundfield is analyzed and is shown to be frequency and sound source dependent. The limitation of the squeezing process is then derived and evaluated.To further reduce the required band width, a mono downmixing is introduced for S3AC, with the source localisation information represented by S3AC cues. Compared with cues in other spatial audio coding approaches, the S3AC cues benefit from its feature of representing direct localisation information. Thus, an efficient S3AC cue quantisation solution based on psychoacoustical localisation principle is presented. In addition, a sound source localisation estimation algorithm is introduced, which can be used for any arbitrary multichannel audio format for extended flexibility.
Several additional S3AC applications are introduced. An efficient compression solution for Ambisonics B-format surround soundfield recording is presented based on S3AC, which also extends the backward compatibility of Ambisonics signals. Abinaural reproduction technique is also described for any S3AC encoded signal, for providing virtual surround sound experience over headphones. The S3AC soundfield squeezing idea is then exploited for multi-party teleconferencing scenario, where soundfields from different remote sites are perceptually discriminated when playedback at the local site. The S3AC squeezing limitation derived earlier is then further exploited for representing multiple surround soundfields with one S3AC downmix.
Finally, the S3AC approach is extended for compressing multichannel three-dimensional audio signals. A source localisation estimation algorithm for any arbitrary 3D audio format is developed. The resulting source localisation is quantised based on a 3D source localisation quantisation approach, which exploits psychoacoustical principles for minimum localisation distortion. An extended S3AC spatial squeezing algorithm for is introduced for efficient and backward compatible representation of a 3D soundfield with a stereo downmix, while the 3D soundfield can also be representedby a mono downmix with accompanying S3AC cues that directly represent sourcelocalisation information.
Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.