Catalogue of abusive language training data
Uploaded: 2022-02-09
Languages: EnglishArabic, Bengali, Croatian, Danish, Estonian, Greek, Hindi, Indonesian, Latvian, Portuguese, Polish, Russian, Slovene, Ukranian, French, Chinese, German, Spanish, Turkish
Collected from: 2020
Summary
A multilingual catalogue of large annotated datasets containing hate speech, abusive, and offensive language (as defined by the authors).
Subject keywords: NLP, abusive language, online language
Data types: Written
Funders: N/A
Associated AIFL centres: Forensic Linguistic Databank (FoLD)
License: Non-Commercial Government Licence for public sector information
Data types: Written
Funders: N/A
Associated AIFL centres: Forensic Linguistic Databank (FoLD)
License: Non-Commercial Government Licence for public sector information
Description
The resource catalogues datasets annotated for hate speech, online abuse, and offensive language in many languages. These datasets are intended for computational purposes, e.g. training a natural language processing system, but may be useful to other researchers as well. associated publication: Vidgen B, Derczynski L (2020) Directions in abusive language training data, a systematic review: Garbage in, garbage out. PLoS ONE 15(12): e0243300. https://doi.org/10.1371/journal.pone.0243300
Data Owners
Link
The data is stored externally. Please follow the link below for access.