Wykop authorship corpus
Polish Reddit-like forum
Data types: Written
Funders: Research England (UKRI)
Associated AIFL centres: Centre for Forensic Text Analysis (FTA)
The present dataset includes 12730802 written posts that were collected from a Polish website structured similarly to Reddit. The posts are authored by 143516 different users of the website. The final corpus size amounts to more that 345 million tokens. The data was collected and structured as an authorship analysis research corpus by the Aston Institute for Forensic Linguistics.