Multiview-Based 3-D Action Recognition Using Deep Networks
In multiview learning, views may be obtained from multiple sources or extracted from a single source as different features. In this paper, effective multiple views from skeleton sequences are proposed to learn the discriminative features using multiple networks for three-dimensional human action recognition. Specifically, three views are constructed in the spatial domain and fed to a stack of long short-term memory networks to exploit temporal information and three views are constructed using the improved joint trajectory maps and fed to three convolutional neural networks to exploit spatial information. Multiply fusion is used to combine the recognition scores of all views. The proposed method has been verified and achieved the state-of-the-art results on the widely used UTD-MHAD, MSRC-12 Kinect Gesture, and NTU red, green, blue (RGB)+D datasets.