Yapla authorship corpus
Russian Reddit-like website
Data types: Written
Funders: Research England (UKRI)
Associated AIFL centres: Centre for Forensic Text Analysis (FTA)
The present dataset includes 44021503 written posts that were collected from a Russian website structured similarly to Reddit. The posts are authored by 245367 different users of the website. The final corpus size amounts to more that 1 billion tokens. The data was collected and structured as an authorship analysis research corpus by the Aston Institute for Forensic Linguistics.