COTER: Conditional Optimal Transport meets Table Retrieval

Publication Name

WSDM 2024 - Proceedings of the 17th ACM International Conference on Web Search and Data Mining

Abstract

Ad hoc table retrieval refers to the task of performing semantic matching between given queries and candidate tables. In recent years, the approach to addressing this retrieval task has undergone significant shifts, transitioning from utilizing hand-crafted features to leveraging the power of Pre-Trained Language Models (PLMs). However, key challenges arise when candidate tables contain shared items, and/or queries may refer to only a subset of table items rather than the entire one. Existing models often struggle to distinguish the most informative items and fail to accurately identify the relevant items required to match with the query. To bridge this gap, we propose C onditional O ptimal T ransport based table retrievER (COTER). The proposed algorithm is characterized by simplifying candidate tables, where the semantic meaning of one or several words (from the original table) is enabled to be effectively "transported'' to individual words (from the simplified table), under the prior condition of the query. COTER achieves two essential goals simultaneously: minimizing the semantic loss during the table simplification and ensuring that retained items from simplified tables effectively match the given query. Importantly, the theoretical foundation of COTER empowers it to adapt dynamically to different queries and enhances the overall performance of the table retrieval. Experiments on two popular Web-Table retrieval benchmarks show that COTER can effectively identify informative table items without sacrificing retrieval accuracy. This leads to the new state-of-The-Art with substantial gains of up to 0.48 absolute Mean Average Precision (MAP) points, compared to the previously reported best result.

Open Access Status

This publication may be available as open access

First Page

911

Last Page

919

Share

COinS
 

Link to publisher version (DOI)

http://dx.doi.org/10.1145/3616855.3635796