Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

Gaikwad, Saurabh Gaikwad, Ranasinghe, Tharindu, Zampieri, Marcos and Homan, Christopher (2021). Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi. IN: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). UNSPECIFIED.

Abstract

The widespread presence of offensive language on social media motivated the development of systems capable of recognizing such content automatically. Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English. To address this shortcoming, we introduce MOLD, the Marathi Offensive Language Dataset. MOLD is the first dataset of its kind compiled for Marathi, thus opening a new domain for research in low-resource Indo-Aryan languages. We present results from several machine learning experiments on this dataset, including zero-short and other transfer learning experiments on state-of-the-art cross-lingual transformers from existing data in Bengali, English, and Hindi.

Publication DOI:	https://doi.org/10.48550/arXiv.2109.03552
Additional Information:	This accepted manuscript is distributed under the terms of the Creative Commons Attribution License CC BY [https://creativecommons.org/licenses/by/4.0/], which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Last Modified:	16 Mar 2026 08:02
Date Deposited:	15 May 2023 11:19
Full Text Link:
Related URLs:	https://arxiv.o ... /abs/2109.03552 (Publisher URL)
PURE Output Type:	Conference contribution
Published Date:	2021-09
Authors:	Gaikwad, Saurabh Gaikwad Ranasinghe, Tharindu ( 0000-0003-3207-3821) Zampieri, Marcos Homan, Christopher

Download

Version: Accepted Version

License: Creative Commons Attribution

| Preview

Export / Share Citation

Explore Further

Statistics

Additional statistics for this record

Record administration