Scopus Harvesting Series

Focal and Global Spatial-Temporal Transformer for Skeleton-Based Action Recognition

Zhimin Gao, Zhengzhou University
Peitao Wang, Zhengzhou University
Pei Lv, Zhengzhou University
Xiaoheng Jiang, Zhengzhou University
Qidong Liu, Zhengzhou University
Pichao Wang, Alibaba Group, USA
Mingliang Xu, Zhengzhou University
Wanqing Li, University of Wollongong

Publication Name

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

Despite great progress achieved by transformer in various vision tasks, it is still underexplored for skeleton-based action recognition with only a few attempts. Besides, these methods directly calculate the pair-wise global self-attention equally for all the joints in both the spatial and temporal dimensions, undervaluing the effect of discriminative local joints and the short-range temporal dynamics. In this work, we propose a novel Focal and Global Spatial-Temporal Transformer network (FG-STFormer), that is equipped with two key components: (1) FG-SFormer: focal joints and global parts coupling spatial transformer. It forces the network to focus on modelling correlations for both the learned discriminative spatial joints and human body parts respectively. The selective focal joints eliminate the negative effect of non-informative ones during accumulating the correlations. Meanwhile, the interactions between the focal joints and body parts are incorporated to enhance the spatial dependencies via mutual cross-attention. (2) FG-TFormer: focal and global temporal transformer. Dilated temporal convolution is integrated into the global self-attention mechanism to explicitly capture the local temporal motion patterns of joints or body parts, which is found to be vital important to make temporal transformer work. Extensive experimental results on three benchmarks, namely NTU-60, NTU-120 and NW-UCLA, show our FG-STFormer surpasses all existing transformer-based methods, and compares favourably with state-of-the-art GCN-based methods.

Open Access Status

This publication may be available as open access

Volume

13844 LNCS

First Page

155

Last Page

171

Funding Number

61906173

Funding Sponsor

National Natural Science Foundation of China

Link to Full Text

COinS

Link to publisher version (DOI)

http://dx.doi.org/10.1007/978-3-031-26316-3_10

Scopus Harvesting Series

Focal and Global Spatial-Temporal Transformer for Skeleton-Based Action Recognition

Publication Name

Abstract

Open Access Status

Volume

First Page

Last Page

Funding Number

Funding Sponsor

Link to publisher version (DOI)

Search

Browse

Links

Scopus Harvesting Series

Focal and Global Spatial-Temporal Transformer for Skeleton-Based Action Recognition

Authors

Publication Name

Abstract

Open Access Status

Volume

First Page

Last Page

Funding Number

Funding Sponsor

Share

Link to publisher version (DOI)

Search

Browse

Links