Learning Spatial-context-aware Global Visual Feature Representation for Instance Image Retrieval

Publication Name

Proceedings of the IEEE International Conference on Computer Vision

Abstract

In instance image retrieval, considering local spatial information within an image has proven effective to boost retrieval performance, as demonstrated by local visual descriptor based geometric verification. Nevertheless, it will be highly valuable to make ordinary global image representations spatial-context-aware because global representation based image retrieval is appealing thanks to its algorithmic simplicity, low memory cost, and being friendly to sophisticated data structures. To this end, we propose a novel feature learning framework for instance image retrieval, which embeds local spatial context information into the learned global feature representations. Specifically, in parallel to the visual feature branch in a CNN backbone, we design a spatial context branch that consists of two modules called online token learning and distance encoding. For each local descriptor learned in CNN, the former module is used to indicate the types of its surrounding descriptors, while their spatial distribution information is captured by the latter module. After that, the visual feature branch and the spatial context branch are fused to produce a single global feature representation per image. As experimentally demonstrated, with the spatial-context-aware characteristic, we can well improve the performance of global representation based image retrieval while maintaining all of its appealing properties. Our code is available at https://github.com/Zy-Zhang/SpCa.

Open Access Status

This publication is not available as open access

First Page

11216

Last Page

11225

Share

COinS
 

Link to publisher version (DOI)

http://dx.doi.org/10.1109/ICCV51070.2023.01033