Year

2014

Degree Name

Doctor of Philosophy

Department

School of Electrical, Computer and Telecommunications Engineering

Abstract

Free Viewpoint Video (FVV) aims to provide users with the ability to select arbitrary views of a dynamic scene in real-time. FVV systems widely adopt simplified plenoptic signal representations, in particular light field (LF). This is referred as an LF-based FVV system in this thesis. An LF-based FVV system consists of three main components: acquisition component, rendering component, and compression/transmission component. The efficacies of these components directly affect the quality of the output video.

The main aim of this research is to propose a novel theory and mathematical framework for analytical comparison, evaluation, and optimization of the LF acquisition and rendering components for a realistic under-sampled LF and approximated depth information with errors in depth maps. In contrast, most of the current researches on LF analytical evaluation focus on perfect signal reconstruction and are adequate to objectively predict and assess the influences of imperfections of acquisition and rendering on the output video quality.

In the core of the proposed theory there is the concept of effective sampling density (ESD). ESD is shown to be an analytically tractable metric that represents the combined impact of the imperfections of LF acquisition and rendering and can be used to directly predict/estimate output video quality from system parameters. The ESD for the commonly used LF acquisition configurations and rendering methods are derived and analyzed for evaluation and comparison. This claim is verified by extensive numerical simulations. Furthermore, an empirical relationship between the rendering quality (in PSNR) of a system and its ESD is established to allow direct prediction of the overall video quality without the actual implementation of the system. A small scale subjective user study is also conducted which indicates a high correlation between ESD and perceived quality.

In addition to comparison and evaluation of LF acquisition and rendering components and objective quality assessment of LF-based FVV systems, ESD theory is also applied to several other significant problems. The first problem is LF acquisition optimization. In particular for a simplified regular grid acquisition, this optimization leads to calculation of the number of cameras required to capture the scene. Existing methods calculate the Nyquist density by assuming a band-limited signal and perfect reconstruction of an arbitrary view using linear interpolation, which often results in an impractically high number of cameras. In contrast, by employing ESD to solve this problem, it is possible to study the problem for under-sampled LF under realistic conditions (non-Lambertian reflections and occlusions) and rendering with complex interpolations. Theoretical and numerical results show that the resulting number of cameras is significantly lower than what was reported in the previous studies with only a few percent reduction in the rendering quality. Moreover, it is shown that the previous methods are special cases of the one derived from ESD theory.

The second problem is LF rendering optimization. The ESD theory is utilized to provide an estimation of the rendering complexity in terms of optimum number of rays employed in interpolation algorithm so as to compensate for the adverse effect caused by errors in depth maps for a given rendering quality. The proposed method is particularly useful in designing a rendering algorithm with inaccurate knowledge of depth to achieve the required rendering quality.

The third problem is a joint optimization of both LF acquisition and LF rendering to achieve a desired output quality. In particular, the trade-off among acquisition camera density, ray selection, depth error and rendering quality is studied using ESD and methods are presented to optimize these parameters for a system with a desired output quality in terms of ESD or PSNR by applying a Lagrangean method to ESD. Employing the proposed method on a regular grid camera system shows that the number of cameras can be reduced by 8 times if 32 rays, instead of 8 rays, are employed during rendering to achieve a similar rendering quality for a typical 20% error in depth estimation.

While in original presentation of ESD, the scene complexity is assumed to be fixed, the fourth problem focuses on the scene complexity and how a non-uniform/irregular acquisition can lead to a higher output quality. LF acquisition is theoretically considered as a problem of plenoptic signal sampling. It is typically performed by using a regular acquisition such as a regular camera grid. While a regular acquisition itself results in non-uniform sampling density, this non-uniformity does not match the scene complexity and frequency variations. To give a solution to the fourth problem the ESD theory is superimposed with the scene complexity and an irregular acquisition method is proposed for optimum non-uniform LF sampling corresponding to the variations of the scene complexity. Specifically, scene complexity is measured through analyzing DCT coefficients of reference images of the scene, describing the frequency behavior of the plenoptic signal over the scene space. An optimization model is formulated to calculate the optimum configurations of the acquisition cameras including positions and orientations. The theoretical analysis and numerical simulations demonstrate that the rendered video quality can be significantly improved (around 20% in mean PSNR) by employing the proposed irregular acquisition compared with the regular camera grid.

To validate the proposed theory, a simulation system is proposed. The simulator takes a 3D model of a scene and generates both reference cameras images and ground truth images. The proposed simulation system is highly flexible and efficient to automatically generate different datasets and objectively compare and analyze any LF-based FVV systems for any given experiment design scheme.

While the fundamentals of ESD theory is studied and reported in this thesis, the theory requires significant further research. The author is working on extending the ESD theory and applying it to more problems and will report the results in future publications.

FoR codes (2008)

080103 Computer Graphics, 080104 Computer Vision, 080106 Image Processing, 090609 Signal Processing

Share

COinS
 

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.