Scopus Harvesting Series

INTER-MODALITY FUSION BASED ATTENTION FOR ZERO-SHOT CROSS-MODAL RETRIEVAL

Bela Chakraborty, University of Wollongong
Peng Wang, University of Wollongong
Lei Wang, University of Wollongong

Publication Name

Proceedings - International Conference on Image Processing, ICIP

Abstract

Zero-shot cross-modal retrieval (ZS-CMR) performs the task of cross-modal retrieval where the classes of test categories have a different scope than the training categories. It borrows the intuition from zero-shot learning which targets to transfer the knowledge inferred during the training phase for seen classes to the testing phase for unseen classes. It mimics the real-world scenario where new object categories are continuously populating the multi-media data corpus. Unlike existing ZS-CMR approaches which use generative adversarial networks (GANs) to generate more data, we propose Inter-Modality Fusion based Attention (IMFA) and a framework ZS INN FUSE (Zero-Shot cross-modal retrieval using INNer product with image-text FUSEd). It exploits the rich semantics of textual data as guidance to infer additional knowledge during the training phase. This is achieved by generating attention weights through the fusion of image and text modalities to focus on the important regions in an image. We carefully create a zero-shot split based on the large-scale MSCOCO and Flickr30k datasets to perform experiments. The results show that our method achieves improvement over the ZS-CMR baseline and self-attention mechanism, demonstrating the effectiveness of inter-modality fusion in a zero-shot scenario.

Open Access Status

This publication is not available as open access

Volume

2021-September

First Page

2648

Last Page

2652

Link to Full Text

COinS

Link to publisher version (DOI)

http://dx.doi.org/10.1109/ICIP42928.2021.9506182

Scopus Harvesting Series

INTER-MODALITY FUSION BASED ATTENTION FOR ZERO-SHOT CROSS-MODAL RETRIEVAL

Publication Name

Abstract

Open Access Status

Volume

First Page

Last Page

Link to publisher version (DOI)

Search

Browse

Links

Scopus Harvesting Series

INTER-MODALITY FUSION BASED ATTENTION FOR ZERO-SHOT CROSS-MODAL RETRIEVAL

Authors

Publication Name

Abstract

Open Access Status

Volume

First Page

Last Page

Share

Link to publisher version (DOI)

Search

Browse

Links