OffensEval 2023: Offensive language identification in the age of Large Language Models

Abstract

The OffensEval shared tasks organized as part of SemEval-2019–2020 were very popular, attracting over 1300 participating teams. The two editions of the shared task helped advance the state of the art in offensive language identification by providing the community with benchmark datasets in Arabic, Danish, English, Greek, and Turkish. The datasets were annotated using the OLID hierarchical taxonomy, which since then has become the de facto standard in general offensive language identification research and was widely used beyond OffensEval. We present a survey of OffensEval and related competitions, and we discuss the main lessons learned. We further evaluate the performance of Large Language Models (LLMs), which have recently revolutionalized the field of Natural Language Processing. We use zero-shot prompting with six popular LLMs and zero-shot learning with two task-specific fine-tuned BERT models, and we compare the results against those of the top-performing teams at the OffensEval competitions. Our results show that while some LMMs such as Flan-T5 achieve competitive performance, in general LLMs lag behind the best OffensEval systems.

Publication DOI: https://doi.org/10.1017/S1351324923000517
Divisions: College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies > Applied AI & Robotics
College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies
Additional Information: © The Author(s), 2023. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited
Uncontrolled Keywords: Machine learning,Text classification,Software,Artificial Intelligence,Language and Linguistics,Linguistics and Language
Publication ISSN: 1351-3249
Last Modified: 03 May 2024 07:22
Date Deposited: 06 Dec 2023 12:36
Full Text Link:
Related URLs: https://www.cam ... C0B732B49DB8CA3 (Publisher URL)
http://www.scop ... tnerID=8YFLogxK (Scopus URL)
PURE Output Type: Review article
Published Date: 2023-12-06
Accepted Date: 2023-11-06
Authors: Zampieri, Marcos
Rosenthal, Sara
Nakov, Preslav
Dmonte, Alphaeus
Ranasinghe, Tharindu (ORCID Profile 0000-0003-3207-3821)

Download

[img]

Version: Published Version

License: Creative Commons Attribution

| Preview

Export / Share Citation


Statistics

Additional statistics for this record