A Large-Scale English Multi-Label Twitter Dataset for Cyberbullying and Online Abuse Detection

Salawu, Semiu, Lumsden, Jo and He, Yulan (2021). A Large-Scale English Multi-Label Twitter Dataset for Cyberbullying and Online Abuse Detection. IN: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021). Mostafazedeh Davani, Aida; Kiela, Douwe; Lambert, Mathias; Vidgen, Bertie; Prabhakaran, Vinodkumar and Waseem, Zeerak (eds) Association for Computational Linguistics.

Abstract

In this paper, we introduce a new English Twitter-based dataset for online abuse and cyberbullying detection. Comprising 62,587 tweets, this dataset was sourced from Twitter using specific query terms designed to retrieve tweets with high probabilities of various forms of bullying and offensive content, including insult, profanity, sarcasm, threat, porn and exclusion. Analysis performed on the dataset confirmed common cyberbullying themes reported by other studies and revealed interesting relationships between the classes. The dataset was used to train a number of transformer-based deep learning models returning impressive results.

Publication DOI:	https://doi.org/10.18653/v1/2021.woah-1.16
Divisions:	College of Engineering & Physical Sciences > Aston STEM Education Centre ?? 50811700Jl ?? College of Engineering & Physical Sciences > Aston Institute of Urban Technology and the Environment (ASTUTE) College of Engineering & Physical Sciences > Systems analytics research institute (SARI) College of Engineering & Physical Sciences
Additional Information:	Copyright © 2021 The Association for Computational Linguistics. Licensed under the Creative Commons Attribution license https://creativecommons.org/licenses/by/4.0/
Event Title:	The 5th Workshop on Online Abuse and Harms
Event Type:	Other
Event Dates:	2021-08-06 - 2021-08-06
ISBN:	9781954085596
Last Modified:	20 Feb 2026 09:34
Date Deposited:	01 Apr 2022 13:40
Full Text Link:
Related URLs:	https://aclanth ... 1.woah-1.16.pdf (Publisher URL) https://bitbuck ... llying-twitter/ (Related URL)
PURE Output Type:	Conference contribution
Published Date:	2021-08-06
Accepted Date:	2021-08-01
Authors:	Salawu, Semiu Lumsden, Jo ( 0000-0002-8637-7647) He, Yulan ( 0000-0003-3948-5845)

Download

Version: Published Version

License: Creative Commons Attribution

| Preview

Export / Share Citation

Explore Further

Statistics

Additional statistics for this record

Record administration