A Two-Stream Neural Network for Pose-Based Hand Gesture Recognition

Publication Name

IEEE Transactions on Cognitive and Developmental Systems


Pose-based hand gesture recognition has been widely studied in the recent years. Compared with full body action recognition, hand gesture involves joints that are more spatially closely distributed with stronger collaboration. This nature requires a different approach from action recognition to capturing the complex spatial features. Many gesture categories, such as 'Grab' and 'Pinch,' have very similar motion or temporal patterns posing a challenge on temporal processing. To address these challenges, this article proposes a two-stream neural network with one stream being a self-attention-based graph convolutional network (SAGCN) extracting the short-term temporal information and hierarchical spatial information, and the other being a residual-connection-enhanced bidirectional independently recurrent neural network (IndRNN) for extracting long-term temporal information. The SAGCN has a dynamic self-attention mechanism to adaptively exploit the relationships of all hand joints in addition to the fixed topology and local feature extraction in the GCNs. The proposed method effectively takes advantage of the GCN and IndRNN to capture the temporal-spatial information. The widely used Dynamic Hand Gesture dataset (two evaluation protocols) and First-Person Hand Action dataset are used to validate its effectiveness and our method achieves state-of-the-art performance with 96.31%, 94.05%, and 90.26%, respectively, in terms of recognition accuracy.

Open Access Status

This publication may be available as open access





First Page


Last Page


Funding Number


Funding Sponsor

National Natural Science Foundation of China



Link to publisher version (DOI)