University of Wollongong
Browse

Simplicity is Key: Advancing NLP Models with Optimal Transport Theory

thesis
posted on 2025-05-15, 03:24 authored by Wangli Yang

The accurate understanding and interpretation of context is critical for the rapidly evolving field of Natural Language Processing (NLP). Human language, with its inherent complexity, marked by ambiguity, redundant contents, and subtle meanings, demands a comprehensive approach to contextual analysis. This becomes particularly crucial in downstream tasks, such as Question Answering (QA) and Anomaly Detection (AD), among others. For instance, in QA tasks, the ability to identify relevant contextual cues within the provided question/passage directly impacts the precision of generated answers. Similarly, in AD tasks, the challenge lies in accurately identifying outliers from the norm within textual data, necessitating a deep understanding of context to differentiate between genuine anomalies and normal data. These examples emphasize the critical importance of context understanding in NLP, highlighting the need for novel methods to capture semantic meaning from input contents. Therefore, this thesis introduces an advanced approach that leverages Optimal Transport (OT) theory to enhance the model capacity for context analysis.

The OT theory, originally developed for solving problems in economics and physics, presents a robust mathematical framework applicable in various disciplines. Specifically, it offers a powerful means of comparing and transporting probability distributions in a geometric space. Its relevance to NLP tasks stems from its ability of quantifying the distance between textual elements in a coherent and geometrically consistent manner, making it invaluable for understanding of the semantic relationships and structures within natural language data. As a result, this thesis explores the application of the OT theory in both Multiple-Choice Question Answering (MCQA) and Out-of-Domain (OOD) detection tasks.

Firstly, for the MCQA task, OT facilitates a precise comparison of question and answer contents by identifying and aligning their key elements. This alignment, driven by the cost of transporting one content element to another, allows for a deeper understanding of contextual relationships beyond superficial embedding similarities. Consequently, only identified key clues are utilized for the subsequent question answering, thereby improving the model efficiency and effectiveness.

Secondly, for the OOD detection task, the OT framework is utilized to measure and quantify geometric differences between distributions. This capability enables the identification of outliers, considering not only their content but also their contextual relevance. This distinction becomes crucial for accurately distinguishing between genuine anomalies and normal instances. By utilizing OT, models can detect subtle textual variations that traditional methods might miss, leading to a reduction in false positives and an improvement in detection accuracy.

Extensive experimentation and comparative analysis are carried out in this thesis, revealing the superior performance of the OT-based approach compared to traditional methods. Overall, the findings highlight the considerable promise of OT within the NLP domain. Unlike conventional techniques, which typically rely on direct comparisons or surface-level feature matching, OT evaluates the structural and contextual coherence of text within the latent space. This unique capability enables OT to effectively tackle the complexities and subtleties inherent in human language, resulting in NLP models that exhibit enhanced precision, reliability, and contextual awareness.

History

Year

2024

Thesis type

  • Masters thesis

Faculty/School

School of Computing and Information Technology

Language

English

Disclaimer

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC