posted on 2024-11-12, 10:40authored bySaugata Bose
In the expansive digital realm, the task of detecting anomalous speech patterns within concise texts presents multifaceted challenges. These challenges stem from the inherent ambiguity in defining what constitutes anomalous speech, the evolution of malicious linguistic tactics, and the intricacies involved in discerning nuanced expressions such as sarcasm or subtle deviations from the norm. Traditional methodologies that often categorize content into binary or multiclass frameworks struggle to encapsulate the rich tapestry of emotions and sentiments embedded within daily online communications. Furthermore, facing these challenges is the challenge of data-related hurdles, including the scarcity of labeled datasets, the imbalances within the available data sets, and the lack of standardized benchmarks, further complicating the anomaly detection process. To address these pressing issues, our study introduces three innovative models, each engineered to tackle the distinct facets of anomalous speech detection: DeepOC-AnomalyDetect, Deep One-Class Fine Tuning (DOCFT), and VarietyDetect. The DeepOC-AnomalyDetect model excels by integrating advanced deep learning with an encoder and recurrent neural network. This integration revolutionizes the detection of anomalies in short texts by adeptly deciphering intricate language nuances. Interestingly, this model achieves significantly higher F1 scores, boasting improvements ranging from 11.67% to 33.33% compared to the best performing binary-class models. On the other hand, the fully deep end-to-end DOCFT model embodies a sophisticated approach, meticulously fine-tuned for the identification of concise yet anomalous content. By utilizing One-Class SVM-style hyperplanes and employing transfer learning, this model attains impressive F1 scores (up to 0.89), notable AUC values (reaching 0.90), and significantly lowers FPR and FNR across diverse datasets. This consistency between variations underscores its superior ability to identify anomalous short-text content. Meanwhile, the VarietyDetect model adopts a semi-supervised approach, combining self-training and transfer learning to effectively use unlabeled data. It shows robust adaptability and consistent performance in short-text anomaly detection, particularly excelling in detecting offensive speech with notable precision (0.76), recall (0.84), F1 score of anomaly class (0.79), and AUC (0.78), while maintaining lower FPR and FNR compared to the core model. Together, these models redefine the landscape of anomaly detection within sentiment analysis by emphasizing the potency of one-class classification, contributing significantly to a safer and more secure digital environment.
History
Year
2023
Thesis type
Doctoral thesis
Faculty/School
School of Computing and Information Technology
Language
English
Disclaimer
Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.