Encoding multiple audio objects using intra-object sparsity



Publication Details

M. Jia, Z. Yang, C. Bao, X. Zheng & C. Ritz, "Encoding multiple audio objects using intra-object sparsity," IEEE Transactions on Audio, Speech and Language Processing, vol. 23, (6) pp. 1082-1095, 2015.


Preserving audio scenes in the form of audio objects has become common in recent years. Object-based audio techniques provide more flexibility for personalized rendering as well as a more accurate audio object trajectory. For encoding and transmitting multiple audio objects in a lossy manner, a new compression framework for multiple simultaneously occurring audio objects is presented in this work. The proposed encoding approach is based on the intra-object sparsity (approximate k -sparsity). After establishing a quantitative measure of approximate k -sparsity, statistical analysis is employed to validate the proposed intra-object sparsity of audio objects. By exploring this intra-object sparsity, multiple simultaneously occurring audio objects are compressed into a mono downmix signal with side information. This downmix signal can be further compressed by legacy audio codecs. Meanwhile, the side information is transmitted in a lossless manner. The objective and subjective evaluations revealed that the proposed compression framework achieved better perceptual quality compared to an existing technique where up to eight audio objects are considered. The subjective evaluations also confirmed that the proposed approach is able to achieve scalable transmission according to the bandwidth while preserving the perceptual quality of both the individual audio objects and the spatial audio scenes.

Please refer to publisher version or contact your library.



Link to publisher version (DOI)