Doctor of Philosophy
School of Electrical, Computer and Telecommunications Engineering, Faculty of Informatics
Que, Ying P, A paradigm for delivering a scalable and low-latency immersive voice over IP service to resource-constrained clients in distributed virtual environment, PhD thesis, School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2008. http://ro.uow.edu.au/theses/806
This thesis addresses the problem space of developing a high quality, yet scalable multi-party voice communication service to the ever more popular Distributed Virtual Environments (DVE) exemplified by the Multiplayer Online Games (MOG). The social interactive experience of DVE users can be greatly enhanced if the users feel immersed in a realistic environment via high fidelity visual scenes, auditory scenes and haptics. While recognising the primary role played by the visual scenes and the consequent rapid progresses made in the field of computer graphics, in this thesis, we investigate the network delivery of live DVE user voices which will be an important supplement to the primary visual scenes in creating the sense of immersion for the DVE users. The Immersive Voice over IP (VoIP) service is characterised by the creation of an auditory scene for each user which is the personalised mix of all the voices within that user’s hearing range. Each constituent voice stream in an auditory scene is localised (directional placement) and distance-attenuated in accordance with the visual positions of corresponding users in the virtual world. We believe the auditory scenes created for Immersive VoIP will be much more conducive to the creation of users’ sense of immersion than the currently prevalent text-chat and mono VoIP applications. During our literature review, we first identified three key challenges faced by the Immersive VoIP service provider, i.e., the need to reduce respectively, the voice processing cost, the voice exchange bandwidth cost and the voice transmission latency. In view of the limited resources onboard the DVE clients (especially the wireless clients), the concept of server-rendered Immersive VoIP service is proposed which employs dedicated servers to complete the computationally expensive voice rendering tasks on behalf of the clients. Nevertheless, in exchange for the minimisation of client-side resource loads, a server processing scalability problem is created where each of the Auditory Scene Creation (ASC) servers could bear a prohibitively large processing load when supporting a large and dense user population. In the course of conducting subjective listening tests, we soon realised that the further away a listener is from the voice source, the greater is the voice localisation error which can be tolerated by the listener. This important result verifies the conjecture of distance-governed variable ii Abstract iii acceptable_localisation_error which allows the same localised voice to be shared between nearby listening users. This conjecture allows for siginifcant reductons in voice processing cost through the computational reuse of voice localisation results. In addition to considering the static Auditory Scene Creations, a mechanism known as the Transitional Deviation Reduction Algorithm has also been devised to address the issue that the Immersive VoIP users can be annoyed by more than 5 degrees of angular shifts between the voices localised at successive time instants with respect to the same speaking avatar. To ensure low voice latency for the latency-critical Immersive VoIP service, we propose a two_overlay_hops distributed server architecture. In this architecture, the voice transmission path between any pair of communicating avatars is always two overlay hops, i.e., from the speaking client to the assigned ASC server (1st hop) then from the ASC server to the listening client (2nd hop). As shown by our simulations results, this two_overlay_hops distributed server architecture is capable of significantly improving the voice transmission latency from the central server architecture and the prior distributed server solution devised in (Nguyen, 2006), especially for the challenging, yet realistic DVE scenario where there is a low level of correlations between the distribution of avatars in the DVE virtual world and the distribution of physical clients in the underlying geographically dispersed network. The objective of our server selection/assignment Linear Programming (LP) formulations is to obtain a balance between improving the voice transmission latency and mitigating the associated increases in the client-side voice upload bandwidth cost and the server-side voice processing cost. In particular, the Bandwidth Constrained Formulation reduces the client-side voice upload cost while still achieving a satisfactory level of latency performance by establishing a few ASC server sites which are very Latency Efficient, i.e., capable of meeting the acceptable_latency_constraints of a large number of communicating avatar pairs. However, the LP-based server assignment formulations were proven to be NP-hard and computationally expensive if executed in their original centralised format. Consequently, an alternative scheme was devised to divide the virtual world into equal size (measured in the number of avatar pairs enclosed) partitions so as to parallelise and distribute the execution of the server assignment formulations between different partitions. We examined the impact of avatar mobility in the virtual world on the performance of our distributed server assignment solutions, in particular the solution’s ability to balance Abstract iv between improving the voice latency and reducing the associated rises in client-side upload bandwidth cost. In our simulations, we found that the ASC servers need to be re-assigned at a regular frequency. Despite the significant scalability improvements made by our virtual world partitioning scheme, the optimised server assignment formulations, especially the complex, yet practical Bandwidth Constrained Formulation is still not scalable enough to meet the execution frequencies required for coping with avatar mobility. To this end, we have derived the Bandwidth Reduced Heuristic which offers much faster execution time than the optimal server assignment mechanism, at the expense of small performance loss in terms of both latency and client-side upload bandwidth cost. By combining our virtual world partitioning scheme with the Bandwidth Reduced Heuristic, we have produced a distributed server assignment solution which will enable the Immersive VoIP service to cope with frequent virtual world mobility for a wide range of DVE scenarios in terms of varying user population sizes and/or distribution densities. The only exception is a very sparse DVE which could require the ASC servers to be reassigned once per 60-70 seconds. Such a fast frequency is unrealistic considering the combined delays of server reassignment and handover. Consequently, a really sparse DVE with low intensity communication flows is probably best served by a simple Peer-to-Peer (P2P) architecture with minimum management overhead rather than the client-server architectures.