MIRS: [MASK] Insertion Based Retrieval Stabilizer for Query Variations

Publication Name

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

Pre-trained Language Models (PLMs) have greatly pushed the frontier of document retrieval tasks. Recent studies, however, show that PLMs are vulnerable to query variations, i.e., queries containing misspellings or word re-ordering of original queries, and etc. Despite the increasing interest to robustify the retriever performance, the impact of the query variations is not fully exploited. To effectively address this problem, this paper revisits the Masked-Language Modeling (MLM) and proposes a robust fine-tuning algorithm, termed [MASK] Insertion based Retrieval Stabilizer (MIRS). The proposed algorithm differs from existing methods via the injection of [MASK] tokens into query variations and further encouraging the representation similarity between the pair of original queries and their variations. In comparison to MLM, the traditional [MASK] substitution-then-prediction is less emphasized in MIRS. Additionally, an in-depth analysis of our algorithm is also provided to reveal: (1) the latent representation (or semantic) of the original query forms a convex hull, while the impact of the query variation is then quantified as a “distortion” to this hull via deviating the hull vertices; and (2) inserted [MASK] tokens play a significant role in enlarging the intersection between the newly-formed hull (after variations) and the original one, thereby preserving more semantic from original queries. With the proposed [MASK] injection, MIRS exhibits a relative 1.8 MRR@10 absolute point enhancement on average in the retrieval accuracy, verified using 5 baselines across 3 public datasets with 4 types of query variations. We also provide intensive ablation studies to investigate the hyperparameter sensitiveness, to breakdown the model into individual components to manifest their efficacy, and further, to evaluate the out-of-domain model generalizability.

Open Access Status

This publication is not available as open access

Volume

14146 LNCS

First Page

392

Last Page

407

Funding Number

DP210101426

Funding Sponsor

Australian Research Council

Share

COinS
 

Link to publisher version (DOI)

http://dx.doi.org/10.1007/978-3-031-39847-6_31