Year

2018

Degree Name

Doctor of Philosophy

Department

School of Electrical, Computer, and Telecommunications Engineering

Abstract

This work deals with audio-visual video recognition using machine learning. A general audio-visual video recognition system first extracts auditory and visual feature descriptors, then represents the extracted bi-modal features using feature encoding techniques, and finally performs recognition using a machine learning classifier. This work adapts a similar pipe-line, contributing to the first two major components: visual feature extraction and global feature representation.

Visual feature extraction is a vital step in video recognition. In general, the visual feature extraction starts by detecting spatio-temporal interest points where the features are most discriminative in a video. There are a few problems associated with existing spatio-temporal interest point detectors. Firstly, the detectors are either too sparse, which leads to loss of information, or too dense, which results in additional noise and complexity. Secondly, in case of dynamic background and moving camera, the detectors may extract irrelevant interest points that do not belong to an actual motion. To address these problems, a spatio-temporal interest point detector is designed to extract salient interest points within a region of interest where there is motion. In addition, a video stabilization is integrated in the detector to handle camera motion and dynamic background.

FoR codes (2008)

0801 ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING, 0906 ELECTRICAL AND ELECTRONIC ENGINEERING

Share

COinS
 

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.