Offensive language identification with multi-task learning

Abstract

The widespread presence of offensive content is a major issue in social media. This has motivated the development of computational models to identify such content in posts or conversations. Most of these models, however, treat offensive language identification as an isolated task. Very recently, a few datasets have been annotated with post-level offensiveness and related phenomena, such as offensive tokens, humor, engaging content, etc., creating the opportunity of modeling related tasks jointly which will help improve the explainability of offensive language detection systems and potentially aid human moderators. This study proposes a novel multi-task learning (MTL) architecture that can predict: (1) offensiveness at both post and token levels in English; and (2) offensiveness and related subjective tasks such as humor, engaging content, and gender bias identification in multilingual settings. Our results show that the proposed multi-task learning architecture outperforms current state-of-the-art methods trained to identify offense at the post level. We further demonstrate that MTL outperforms single-task learning (STL) across different tasks and language combinations.

Publication DOI: https://doi.org/10.1007/s10844-023-00787-z
Divisions: College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies > Applied AI & Robotics
College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies
Additional Information: Copyright © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature. This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s10844-023-00787-z
Uncontrolled Keywords: Deep learning,Multi-task learning,Offensive language identification,Transformers,Software,Information Systems,Hardware and Architecture,Computer Networks and Communications,Artificial Intelligence
Publication ISSN: 1573-7675
Last Modified: 25 Apr 2024 07:25
Date Deposited: 19 May 2023 15:37
Full Text Link:
Related URLs: http://www.scop ... tnerID=8YFLogxK (Scopus URL)
PURE Output Type: Article
Published Date: 2023-06
Published Online Date: 2023-04-29
Accepted Date: 2023-03-19
Authors: Zampieri, Marcos
Ranasinghe, Tharindu (ORCID Profile 0000-0003-3207-3821)
Sarkar, Diptanu
Ororbia, Alex

Download

[img]

Version: Accepted Version

Access Restriction: Restricted to Repository staff only until 29 April 2024.

License: ["licenses_description_other" not defined]


Export / Share Citation


Statistics

Additional statistics for this record