Child Sexual Abuse Dark-Web Discussion Forum Corpus
Summary
Dark Web CSEA forum corpus
Data types: Written
Funders: N/A
Associated AIFL centres: Centre for Forensic Text Analysis (FTA)
License: N/A
Description
The present dataset includes 114756 posts that were collected from a Dark Web Child abuse forum. The data is exclusively textual and it was scraped for research use for the Aston Institute for Forensic Linguistics. The posts are authored by 2074 different users of the forum. Corpus size is around 8.4 million tokens.
Data Donors
Controlled
Information: This dataset contains highly sensitive material or data that come from a third party and have heavy constraints on access and use. This dataset is therefore stored not on the FoLD web server but on an air-gapped, offline computer in our secure data lab at the Aston Institute for Forensic Linguistics. Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.