Transformer guided geometry model for flow-based unsupervised visual odometry

Publication Name

Neural Computing and Applications

Abstract

Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images, respectively. For image sequences, a transformer-like structure is adopted to build a geometry model over a local temporal window, referred to as transformer-based auxiliary pose estimator (TAPE). Meanwhile, a flow-to-flow pose estimator (F2FPE) is proposed to exploit the relationship between pairwise images. The two estimators are constrained through a simple yet effective consistency loss in training. Empirical evaluation has shown that the proposed method outperforms the state-of-the-art unsupervised learning-based methods by a large margin and performs comparably to supervised and traditional ones on the KITTI and Malaga dataset.

Open Access Status

This publication is not available as open access

Funding Number

61822701

Funding Sponsor

National Natural Science Foundation of China

Share

COinS
 

Link to publisher version (DOI)

http://dx.doi.org/10.1007/s00521-020-05545-8