Human interaction recognition using low-rank matrix approximation and super descriptor tensor decomposition
RIS ID
115513
Additional Publication Information
ISBN: 9781509041176
Abstract
Audio-visual recognition systems rely on efficient feature extraction. Many spatio-temporal interest point detectors for visual feature extraction are either too sparse, leading to loss of information, or too dense resulting in noisy and redundant information. Furthermore, interest point detectors designed for a controlled environment can be affected by camera motion. In this paper, a salient spatio-temporal interest point detector is proposed based on a low-rank and group-sparse matrix approximation. The detector handles the camera motion through a short-window video stabilization. The multimodal audio-visual features from multiple descriptors are represented by a super descriptor, from which a compact set of features is extracted through a tensor decomposition and feature selection. This tensor decomposition retains the spatiotemporal structure among features obtained from multiple descriptors. Experimental validation is conducted using two benchmark human interaction recognition datasets: TVHID and Parliament. Experimental results are presented which show that the proposed approach outperforms many state-ofthe- art methods, achieving classification rates of 74.7% and 88.5% on the TVHID and Parliament datasets, respectively.
Grant Number
ARC/DP150104279
Publication Details
M. Khokher, A. Bouzerdoum & S. Phung, "Human interaction recognition using low-rank matrix approximation and super descriptor tensor decomposition,"in 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017, pp. 1847-1851.