A Semi-supervised Approach for Sentiment Analysis of Arab(ic+izi) Messages: Application to the Algerian Dialect

Abstract

Abstract: In this paper, we propose a semi-supervised approach for sentiment analysis of Arabic and its dialects. This approach is based on a sentiment corpus, constructed automatically and reviewed manually by Algerian dialect native speakers. This approach consists of constructing and applying a set of deep learning algorithms to classify the sentiment of Arabic messages as positive or negative. It was applied on Facebook messages written in Modern Standard Arabic (MSA) as well as in Algerian dialect (DALG, which is a low resourced-dialect, spoken by more than 40 million people) with both scripts Arabic and Arabizi. To handle Arabizi, we consider both options: transliteration (largely used in the research literature for handling Arabizi) and translation (never used in the research literature for handling Arabizi). For highlighting the effectiveness of a semi-supervised approach, we carried out different experiments using both corpora for the training (i.e. the corpus constructed automatically and the one that was reviewed manually). The experiments were done on many test corpora dedicated to MSA/DALG, which were proposed and evaluated in the research literature. Both classifiers are used, shallow and deep learning classifiers such as Random Forest (RF), Logistic Regression(LR) Convolutional Neural Network (CNN) and Long short-term memory (LSTM). These classifiers are combined with word embedding models such as Word2vec and fastText that were used for sentiment classification. Experimental results (F1 score up to 95% for intrinsic experiments and up to 89% for extrinsic experiments) showed that the proposed system outperforms the existing state-of-the-art methodologies (the best improvement is up to 25%).

Publication DOI: https://doi.org/10.1007/s42979-021-00510-1
Divisions: College of Business and Social Sciences > School of Social Sciences & Humanities > Aston Centre for Europe
Additional Information: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Uncontrolled Keywords: Original Research,Social Media Analytics and its Evaluation,Arabizi,Sentiment analysis,Arabic,Arabic dialect,Translation,Transliteration
Publication ISSN: 2661-8907
Last Modified: 19 Apr 2024 07:16
Date Deposited: 01 Mar 2021 10:21
Full Text Link:
Related URLs: https://link.sp ... 979-021-00510-1 (Publisher URL)
PURE Output Type: Article
Published Date: 2021-02-27
Accepted Date: 2020-11-27
Submitted Date: 2020-05-23
Authors: Guellil, Imane
Adeel, Ahsan
Azouaou, Faical
Benali, Fodil
Hachani, Ala-Eddine
Dashtipour, Kia
Gogate, Mandar
Ieracitano, Cosimo
Kashani, Reza
Hussain, Amir

Download

[img]

Version: Published Version

License: Creative Commons Attribution

| Preview

Export / Share Citation


Statistics

Additional statistics for this record