Learning attentive dynamic maps (ADMs) for Understanding Human Actions
This paper presents a novel end-to-end trainable deep architecture to learn an attentive dynamic map (ADM) for understanding human motion from skeleton data. An ADM intends not only to capture the dynamic information over the period of human motion, referred to as an action, as the conventional dynamic image/map does, but also to embed in it the spatio-temporal attention for the classification of the action. Specifically, skeleton sequences are encoded into sequences of Skeleton Joint Maps (STMs), each STM encodes both joint location (i.e. spatial) and relative temporal order (i.e. temporal) of the skeleton in the sequence. The STM sequences are fed into a customized 3DConvLSTM to explore the local and global spatio-temporal information from which a dynamic map is learned. This dynamic map is subsequently used to learn the spatio-temporal attention at each time-stamp. ADMs are then generated from the learned attention weights and all hidden states of the 3DConvLSTM and used for action classification. The proposed method achieved competitive performance compared with the state-of-the-art results on the Large Scale Combined dataset, MSRC-12 dataset and NTU RGB+D dataset.