Faculty of Engineering and Information Sciences - Papers: Part A

Encoding and communicating navigable speech soundfields

Xiguang Zheng, University of Wollongong, Dolby LaboratoriesFollow
Christian H. Ritz, University of WollongongFollow
Jiangtao Xi, University of WollongongFollow

RIS ID

103886

Publication Details

X. Zheng, C. Ritz & J. Xi, "Encoding and communicating navigable speech soundfields," Multimedia Tools and Applications, vol. 75, pp. 5183-5204, 2016.

Abstract

This paper describes a system for encoding and communicating navigable speech soundfields for applications such as immersive audio/visual conferencing, audio surveillance of large spaces and free viewpoint television. The system relies on recording speech soundfields using compact co-incident microphone arrays that are then processed to identify sources and their spatial location using the well-known assumption that speech signals are sparse in the time-frequency domain. A low-delay Direction of Arrival (DOA)-based frequency domain sound source separation approach is proposed that requires only 250 ms of speech signal. Joint compression is achieved through a previously proposed perceptual analysis-by-synthesis spatial audio coding scheme that encodes sources into a mixture signal that can be compressed by a standard speech codec at 32 kbps. By also transmitting side information representing the original spatial location of each source, the received mixtures can be decoded and then flexibly reproduced using loudspeakers at a chosen listening point within a synthesised speech scene. The system was implemented based on this framework for an example application encoding a three-talker navigable speech scene at a total bit rate of 48 kbps. Subjective listening tests were conducted to evaluate the quality of the reproduced speech scenes at a new listening point as compared to a true recording at that point. Results demonstrate the approach successfully encodes multiple spatial speech scenes at low bit rates whilst maintaining perceptual quality in both anechoic and reverberant environments.

Grant Number

ARC/DP1094053

Additional Grant Number

http://purl.org/au-research/grants/ARC/DP1094053

Download

Included in

Engineering Commons, Science and Technology Studies Commons

COinS

Link to publisher version (DOI)

http://dx.doi.org/10.1007/s11042-015-2989-3

Grant Link

http://purl.org/au-research/grants/ARC/DP1094053

Faculty of Engineering and Information Sciences - Papers: Part A

Encoding and communicating navigable speech soundfields

RIS ID

Publication Details

Abstract

Grant Number

Additional Grant Number

Included in

Link to publisher version (DOI)

Grant Link

Search

Browse

Links

Faculty of Engineering and Information Sciences - Papers: Part A

Encoding and communicating navigable speech soundfields

Authors

RIS ID

Publication Details

Abstract

Grant Number

Additional Grant Number

Included in

Share

Link to publisher version (DOI)

Grant Link

Search

Browse

Links