Scopus Harvesting Series

Trear: Transformer-Based RGB-D Egocentric Action Recognition

Xiangyu Li, Tianjin University
Yonghong Hou, Tianjin University
Pichao Wang, Alibaba Group, USA
Zhimin Gao, Zhengzhou University
Mingliang Xu, Zhengzhou University
Wanqing Li, University of Wollongong

Publication Name

IEEE Transactions on Cognitive and Developmental Systems

Abstract

In this article, we propose a transformer-based RGB-D egocentric action recognition framework, called Trear. It consists of two modules: 1) interframe attention encoder and 2) mutual-attentional fusion block. Instead of using optical flow or recurrent units, we adopt a self-attention mechanism to model the temporal structure of the data from different modalities. Input frames are cropped randomly to mitigate the effect of the data redundancy. Features from each modality are interacted through the proposed fusion block and combined through a simple yet effective fusion operation to produce a joint RGB-D representation. Empirical experiments on two large egocentric RGB-D data sets: 1) THU-READ and 2) first-person hand action, and one small data set, wearable computer vision systems, have shown that the proposed method outperforms the state-of-the-art results by a large margin.

Open Access Status

This publication may be available as open access

Volume

Issue

First Page

246

Last Page

252

Link to Full Text

COinS

Link to publisher version (DOI)

http://dx.doi.org/10.1109/TCDS.2020.3048883

Scopus Harvesting Series

Trear: Transformer-Based RGB-D Egocentric Action Recognition

Publication Name

Abstract

Open Access Status

Volume

Issue

First Page

Last Page

Link to publisher version (DOI)

Search

Browse

Links

Scopus Harvesting Series

Trear: Transformer-Based RGB-D Egocentric Action Recognition

Authors

Publication Name

Abstract

Open Access Status

Volume

Issue

First Page

Last Page

Share

Link to publisher version (DOI)

Search

Browse

Links