Catalogue of abusive language training data


Uploaded: 2022-02-09
Languages: EnglishArabic, Bengali, Croatian, Danish, Estonian, Greek, Hindi, Indonesian, Latvian, Portuguese, Polish, Russian, Slovene, Ukranian, French, Chinese, German, Spanish, Turkish
Collected from: 2020
Access category: External
To: 2022

Summary

A multilingual catalogue of large annotated datasets containing hate speech, abusive, and offensive language (as defined by the authors).


Subject keywords: NLP, abusive language, online language
Data types: Written
Funders: N/A
Associated AIFL centres: Forensic Linguistic Databank (FoLD)
License: Non-Commercial Government Licence for public sector information

Description

The resource catalogues datasets annotated for hate speech, online abuse, and offensive language in many languages. These datasets are intended for computational purposes, e.g. training a natural language processing system, but may be useful to other researchers as well. associated publication: Vidgen B, Derczynski L (2020) Directions in abusive language training data, a systematic review: Garbage in, garbage out. PLoS ONE 15(12): e0243300. https://doi.org/10.1371/journal.pone.0243300


Data Owners


Link

The data is stored externally. Please follow the link below for access.