Publication Details

Wang, P., Li, Z., Hou, Y. & Li, W. (2016). Action recognition based on joint trajectory maps using convolutional neural networks. Proceedings of the 2016 ACM on Multimedia Conference (pp. 102-106). New York, United States: ACM.


Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition. How to effectively use ConvNets for video-based recognition is still an open problem. In this paper, we propose a compact, effective yet simple method to encode spatiotemporal information carried in 3D skeleton sequences into multiple 2D images, referred to as Joint Trajectory Maps (JTM), and ConvNets are adopted to exploit the discriminative features for realtime human action recognition. The proposed method has been evaluated on three public benchmarks, i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal human action dataset (UTD-MHAD) and achieved the state-of-the-art results.



Link to publisher version (DOI)