University of Wollongong Thesis Collection 2017+

Audio-visual Video Recognition Through Super Descriptor Tensor Decomposition and Low-rank and Sparse Representation

Muhammad Rizwan Khokher, University of Wollongong

Year

2018

Degree Name

Doctor of Philosophy

Department

School of Electrical, Computer, and Telecommunications Engineering

Abstract

This work deals with audio-visual video recognition using machine learning. A general audio-visual video recognition system first extracts auditory and visual feature descriptors, then represents the extracted bi-modal features using feature encoding techniques, and finally performs recognition using a machine learning classifier. This work adapts a similar pipe-line, contributing to the first two major components: visual feature extraction and global feature representation.

Visual feature extraction is a vital step in video recognition. In general, the visual feature extraction starts by detecting spatio-temporal interest points where the features are most discriminative in a video. There are a few problems associated with existing spatio-temporal interest point detectors. Firstly, the detectors are either too sparse, which leads to loss of information, or too dense, which results in additional noise and complexity. Secondly, in case of dynamic background and moving camera, the detectors may extract irrelevant interest points that do not belong to an actual motion. To address these problems, a spatio-temporal interest point detector is designed to extract salient interest points within a region of interest where there is motion. In addition, a video stabilization is integrated in the detector to handle camera motion and dynamic background.

Recommended Citation

Khokher, Muhammad Rizwan, Audio-visual Video Recognition Through Super Descriptor Tensor Decomposition and Low-rank and Sparse Representation, Doctor of Philosophy thesis, School of Electrical, Computer, and Telecommunications Engineering, University of Wollongong, 2018. https://ro.uow.edu.au/theses1/392

FoR codes (2008)

0801 ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING, 0906 ELECTRICAL AND ELECTRONIC ENGINEERING

Download

COinS

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.

University of Wollongong Thesis Collection 2017+

Audio-visual Video Recognition Through Super Descriptor Tensor Decomposition and Low-rank and Sparse Representation

Year

Degree Name

Department

Abstract

Recommended Citation

FoR codes (2008)

Search

Browse

Links

University of Wollongong Thesis Collection 2017+

Audio-visual Video Recognition Through Super Descriptor Tensor Decomposition and Low-rank and Sparse Representation

Author

Year

Degree Name

Department

Abstract

Recommended Citation

FoR codes (2008)

Share

Search

Browse

Links