Yapla authorship corpus

Uploaded: 2021-09-08
Languages: Russian
Collected from: 2021
Access category: Restricted
Email: Not available
To: 2021


Russian Reddit-like website

Subject keywords: corpus, Russian, social media, forum
Data types: Written
Funders: Research England (UKRI)
Associated AIFL centres: Centre for Forensic Text Analysis (FTA)
License: N/A


The present dataset includes 44021503 written posts that were collected from a Russian website structured similarly to Reddit. The posts are authored by 245367 different users of the website. The final corpus size amounts to more that 1 billion tokens. The data was collected and structured as an authorship analysis research corpus by the Aston Institute for Forensic Linguistics.

Data Donors


Information: This dataset contains sensitive material or data that come from a third party and have some constraints on access and use. Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Request Item