University of Wollongong
Browse

Inter-modality Fusion based Attention for Zero-shot Cross-modal Retrieval

Download (13.49 MB)
thesis
posted on 2024-11-12, 14:35 authored by Bela Chakraborty
Due to the advances in the internet and digital image sensors, there has been a drastic in-crease in the volume of visual content. This visual content was generated and shared from various sources, such as medical archival photos, social media platforms, the education sector, and robotics. This resulted in new challenges pertaining to users’ interests, i.e., content-based image retrieval (CBIR), which is a long-established research area. As an initial step to attain the final aim of this thesis, a dive-in into content-based image retrieval using a deep convolutional network was attempted to explore the existing pooling mechanisms, diffusion mechanisms for image retrieval, and the challenges encountered in attaining the task. Experimental results are reported on various image retrieval benchmark datasets and a community archival photo dataset. Results on visual grounding, another computer vision task that has gained pace were also reported on the community archival photo dataset as a part of the initial study.

History

Year

2021

Thesis type

  • Masters thesis

Faculty/School

School of Computing and Information Technology

Language

English

Disclaimer

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC