The impact of Arabic part of speech tagging on sentiment analysis: A new corpus and deep learning approach
Publication Name
Procedia Computer Science
Abstract
Sentiment Analysis is achieved by using Natural Language Processing (NLP) techniques and finds wide applications in analyzing social media content to determine people's opinions, attitudes, and emotions toward entities, individuals, issues, events, or topics. The accuracy of sentiment analysis depends on automatic Part-of-Speech (PoS) tagging which is required to label words according to grammatical categories. The challenge of analyzing the Arabic language has found considerable research interest, but now the challenge is amplified with the addition of social media dialects. While numerous morphological analyzers and PoS taggers were proposed for Modern Standard Arabic (MSA), we are now witnessing an increased interest in applying those techniques to the Arabic dialect that is prominent in social media. Indeed, social media texts (e.g. posts, comments, and replies) differ significantly from MSA texts in terms of vocabulary and grammatical structure. Such differences call for reviewing the PoS tagging methods to adapt social media texts. Furthermore, the lack of sufficiently large and diverse social media text corpora constitutes one of the reasons that automatic PoS tagging of social media content has been rarely studied. In this paper, we address those limitations by proposing a novel Arabic social media text corpus that is enriched with complete PoS information, including tags, lemmas, and synonyms. The proposed corpus constitutes the largest manually annotated Arabic corpus to date, with more than 5 million tokens, 238,600 MSA texts, and words from Arabic social media dialect, collected from 65,000 online users' accounts. Furthermore, our proposed corpus was used to train a custom Long Short-Term Memory deep learning model and showed excellent performance in terms of sentiment classification accuracy and F1-score. The obtained results demonstrate that the use of a diverse corpus that is enriched with PoS information significantly enhances the performance of social media analysis techniques and opens the door for advanced features such as opinion mining and emotion intelligence.
Open Access Status
This publication may be available as open access
Volume
184
First Page
148
Last Page
155
Funding Number
R19099
Funding Sponsor
Zayed University