University of Wollongong
Browse

Deep One-Class Learning for Anomalous Short-text Classification

Download (2.91 MB)
thesis
posted on 2024-11-12, 10:40 authored by Saugata Bose
In the expansive digital realm, the task of detecting anomalous speech patterns within concise texts presents multifaceted challenges. These challenges stem from the inherent ambiguity in defining what constitutes anomalous speech, the evolution of malicious linguistic tactics, and the intricacies involved in discerning nuanced expressions such as sarcasm or subtle deviations from the norm. Traditional methodologies that often categorize content into binary or multiclass frameworks struggle to encapsulate the rich tapestry of emotions and sentiments embedded within daily online communications. Furthermore, facing these challenges is the challenge of data-related hurdles, including the scarcity of labeled datasets, the imbalances within the available data sets, and the lack of standardized benchmarks, further complicating the anomaly detection process. To address these pressing issues, our study introduces three innovative models, each engineered to tackle the distinct facets of anomalous speech detection: DeepOC-AnomalyDetect, Deep One-Class Fine Tuning (DOCFT), and VarietyDetect. The DeepOC-AnomalyDetect model excels by integrating advanced deep learning with an encoder and recurrent neural network. This integration revolutionizes the detection of anomalies in short texts by adeptly deciphering intricate language nuances. Interestingly, this model achieves significantly higher F1 scores, boasting improvements ranging from 11.67% to 33.33% compared to the best performing binary-class models. On the other hand, the fully deep end-to-end DOCFT model embodies a sophisticated approach, meticulously fine-tuned for the identification of concise yet anomalous content. By utilizing One-Class SVM-style hyperplanes and employing transfer learning, this model attains impressive F1 scores (up to 0.89), notable AUC values (reaching 0.90), and significantly lowers FPR and FNR across diverse datasets. This consistency between variations underscores its superior ability to identify anomalous short-text content. Meanwhile, the VarietyDetect model adopts a semi-supervised approach, combining self-training and transfer learning to effectively use unlabeled data. It shows robust adaptability and consistent performance in short-text anomaly detection, particularly excelling in detecting offensive speech with notable precision (0.76), recall (0.84), F1 score of anomaly class (0.79), and AUC (0.78), while maintaining lower FPR and FNR compared to the core model. Together, these models redefine the landscape of anomaly detection within sentiment analysis by emphasizing the potency of one-class classification, contributing significantly to a safer and more secure digital environment.

History

Year

2023

Thesis type

  • Doctoral thesis

Faculty/School

School of Computing and Information Technology

Language

English

Disclaimer

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC