Doctor of Philosophy
School of Electrical, Computer and Telecommunications Engineering, Faculty of Engineering
Thomas-Kerr, Joseph Alfred, Building Babel: freeing multimedia processing and delivery from hard-coded formats, School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2009. http://ro.uow.edu.au/theses/4054
The amount of multimedia content available via the Internet, and the number of formats in which it is encoded, stored and delivered continues to grow rapidly. So too the number and diversity of the devices and software applications which produce, process and consume such content. This constantly changing landscape presents an increasing challenge to interoperability, since more and more software and hardware must be upgraded as new formats are developed. However, many of the operations performed on multimedia content are similar across coding formats. In recognising this, this thesis proposes several approaches to format-independent media processing, with an emphasis on content delivery. This considerably simplifies interoperability, since support for a new content format may be provided by disseminating a data file, rather than requiring application and device providers to extend and modify their software and hardware. A fundamental requirement for format-independence is the ability to describe the structure of any given format in a way that exposes how it may be fragmented for delivery or processing, and how other data important to the processing (for instance temporal or scalability parameters) can be extracted from the binary data. Several meta-syntax languages are evaluated that (to greater or lesser degree) perform this function. Of these, the most suitable for general use in format-independent processors is found to be MPEG-21’s Bitstream Syntax Description Language (BSDL). Its general suitability notwithstanding, BSDL exhibits several critical flaws when used to describe and process modern content formats. In response, this thesis proposes several new features for the language which significantly reduce processing complexity, and provide extensibility for complex data types. These features are implemented and validated using bitstreams of real-world length, which enable a linear response of approximately 10 times the speed of playback (on the particular test machine used), for videos up to one hour in duration. Digital media increasingly encompasses a wide range of metadata, as well as collections of related content (a DVD and it’s “special features”, for instance). Several recent standards address generic virtual containers for such rich content. While these standards—which include MPEG-21 and TVAnytime—provide numerous tools for interacting with rich media objects, they do not provide a framework for streaming or delivery of such data. This thesis presents the Bitstream Binding Language (BBL), a format-independent tool that describes how multimedia content and metadata may be bound into delivery formats. Using a BBL description, a generic processor can map rich content (an MPEG-21 Digital Item, for example) into a streaming or static delivery format. BBL provides a universal syntax for fragmentation and packetisation of both XML and binary data, and allows new content and metadata formats to be delivered without requiring the addition of new software to the delivery infrastructure. The BBL framework is validated and tested against a number of application scenarios including a format-independent streaming server, generic metadata syntax translation, virtual container assembly, and a format-independent hinter.
Finally, it is observed that much of the semantic metadata that is generated to describe multimedia content could also be used to improve the decisions that must be made in order to transmit it effectively. Indeed, methods have been proposed for using specific semantic concepts in the delivery process. However, until now, no high-level system has been proposed that is able to take arbitrary semantic metadata, and utilise it in the multimedia delivery decision-making process. This thesis proposes such a system. It combines the aforementioned semantic concepts with other existing work in Rate-Distortion Optimisation for multimedia delivery, scalable content formats, and syntax description, and then develops a generalised framework to permit an arbitrary range of semantic metadata and optimisation techniques to be utilised. This objective is accomplished by utilising schema languages to describe the details of any given content or metadata, so that declarative mapping rules can be specified for translating from format-specific data points to format-independent concepts that are directly used by the framework. This translation can then be performed using software or hardware that knows nothing about the specific format it is processing.
This thesis describes a particular embodiment of the semantic-aware multimedia delivery system which was implemented in order to verify its key assertions. It presents the results of subjective testing that was performed on several short news clips encoded using H.264/SVC scalable video coding, and Scalable-To-Lossless (SLS) au dio coding. Each clip was adapted to four target bitrates, using both of two methods: (a) using the semantic-aware system to devote a greater proportion of the available bandwidth to that part of the content (audio or video) that was conveying more of the semantics at any given time; and (b) at a constant bit-rate with the same average rate as clip (a). Test participants were shown each pair of clips (a and b) in a random order and were asked to evaluate which was more successful at conveying the meaning of the story. The result of this subjective testing was a 72% preference for those clips which had been adapted so as to devote more bandwidth to the semantically-important parts of the content.