Year

2011

Degree Name

Doctor of Philosophy

Department

University of Wollongong. School of Electrical, Computer and Telecommunications Engineering

Recommended Citation

Cheng, Bin, Spatial squeezing techniques for low bit-rate multichannel audio coding, Doctor of Philosophy thesis, University of Wollongong. School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2011. https://ro.uow.edu.au/theses/3243

Abstract

In recent years, significant research has been focused on efficient compression and representation of multichannel spatial audio signals. Recent developments in this area exploit spatial audio cues representing inter-channel mathematical relationships.The original multichannel signal is down mixed to a backward compatible mono/stereo signal, while the spatial cues are utilized for recovering the surround sound. It is shown that, these approaches provide efficient coding of multichannel spatial audio signals, in terms of both bit-rate reduction and perceptual quality. However, drawback scan be found in these approaches as the spatial cues do not represent perceptually relevant information, which can result in inefficient quantisation, as well as perceptual distortion of the localisation characteristics of the sound field. Furthermore,in these approaches, as the downmixing and spatial cue derivation algorithm is specifically designed to suit a certain multichannel audio format, the flexibility and extensibility for coding future multichannel audio formats is limited.

The Spatially Squeezing Surround Audio Coding (S3AC) is presented in this thesis as an alternative efficient solution for the representation of spatial audio signals. Based on estimating sound source and localisation information in the spatial sound field, the fundamental idea in S3AC is to represent a surround soundfield with a‘squeezed’ soundfield by exploiting perceptual localisation irrelevancy. In particular,it is shown that, while limited perceptual precision is required for representing localisation information of a surround soundfield without perceptual distortion, the localisation precision computationally derivable for a small soundfield is adequate to save the perceptual localisation information of a surround soundfield. Thus, a multichannel spatial audio signal rendering a surround soundfield can be representedby a small soundfield rendered with less channels, while additional spatial cues are not required. A typical S3AC application is then introduced, where a 5.1-channel surround audio signal is efficiently represented by a stereo downmix signal, which renders a ‘squeezed’ version of the original surround soundfield. This stereo signal is backward compatible to a conventional audio system, but can also be exploited to recover the original surround soundfield.

The proposed S³AC approach is then further analyzed. The localisation resolution inthe S³AC squeezed soundfield is analyzed and is shown to be frequency and sound source dependent. The limitation of the squeezing process is then derived and evaluated.To further reduce the required band width, a mono downmixing is introduced for S³AC, with the source localisation information represented by S³AC cues. Compared with cues in other spatial audio coding approaches, the S³AC cues benefit from its feature of representing direct localisation information. Thus, an efficient S³AC cue quantisation solution based on psychoacoustical localisation principle is presented. In addition, a sound source localisation estimation algorithm is introduced, which can be used for any arbitrary multichannel audio format for extended flexibility.

Several additional S³AC applications are introduced. An efficient compression solution for Ambisonics B-format surround soundfield recording is presented based on S³AC, which also extends the backward compatibility of Ambisonics signals. Abinaural reproduction technique is also described for any S³AC encoded signal, for providing virtual surround sound experience over headphones. The S³AC soundfield squeezing idea is then exploited for multi-party teleconferencing scenario, where soundfields from different remote sites are perceptually discriminated when playedback at the local site. The S³AC squeezing limitation derived earlier is then further exploited for representing multiple surround soundfields with one S³AC downmix.

Finally, the S³AC approach is extended for compressing multichannel three-dimensional audio signals. A source localisation estimation algorithm for any arbitrary 3D audio format is developed. The resulting source localisation is quantised based on a 3D source localisation quantisation approach, which exploits psychoacoustical principles for minimum localisation distortion. An extended S³AC spatial squeezing algorithm for is introduced for efficient and backward compatible representation of a 3D soundfield with a stereo downmix, while the 3D soundfield can also be representedby a mono downmix with accompanying S³AC cues that directly represent sourcelocalisation information.

Download

COinS

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.

University of Wollongong Thesis Collection 1954-2016

Spatial squeezing techniques for low bit-rate multichannel audio coding

Year

Degree Name

Department

Recommended Citation

Abstract

Search

Browse

Links

University of Wollongong Thesis Collection 1954-2016

Spatial squeezing techniques for low bit-rate multichannel audio coding

Author

Year

Degree Name

Department

Recommended Citation

Abstract

Share

Search

Browse

Links