Sign language recognition via dimensional global–local shift and cross-scale aggregation
Neural Computing and Applications
Sign languages generally consist of a sequence of upper body gestures and are cooperative processes among various parts such as the hands, arms, and face. Therefore, the dynamics of the parts as well as the holistic appearance of the upper body and individual parts are essential for robust recognition. In this paper, a global–local representation (GLR) module is proposed to boost the spatiotemporal feature modeling. The GLR module is composed of global shift and local shift along the height, width, and temporal dimensions. Specifically, the global shift is applied to the entire feature map for holistic representation, while the local shift restricts itself to local patches to capture detailed features. Furthermore, a novel cross-scale aggregation module is designed to combine the global and local information in different dimensions. Extensive experimental results on three large-scale benchmarks, including WLASL, INCLUDE and LSA64, demonstrate that the proposed method achieves state-of-the-art recognition performance.
Open Access Status
This publication is not available as open access