Uploaded: 2021-05-12
Languages: English
Collected from: 2000
Access category: External
To: 2020


The TextCrimes project is a collaboration within Aston University between the Aston Institute for Forensic Linguistics and the Aston student software company, Beautiful Canoe. TextCrimes provides a corpus of malicious communications.

Subject keywords: malicious communications, abusive communications, threatening communications
Data types: Written
Funders: Research England (UKRI)
Associated AIFL centres: Forensic Linguistic Databank (FoLD)
License: N/A


Through TextCrimes.com, we are providing tagged data sets of malicious communications which can be downloaded or examined using an integrated set of search and analysis tools. We use the term malicious communications broadly to include text types and genres that cause harm including but not limited to abusive and threatening communications. The TextCrimes corpus is structured as discrete collections of texts. Each text collection can have a different owner who has control over access to that collection. User access to any collection can be set to anonymised or full access. Users may have access to more than one text collection and run queries to create sub-collections, which they themselves can save as their own personal collections. These in turn may be shared with other users (assuming they have the correct permissions to see the original texts). Within a text collection texts can be linked into series in a variety of ways. These include: Letters sent by a particular individual: One substantial series is the Operation Heron series which relates to a case, where all the letters were written by Margaret Walker. Letters received by a particular individual: Within the FBI Vault data there is a series of all letters from within the vault that were sent to Elizabeth Taylor. Letters associated with a particular case, whether or not they are thought to have the same authorship. As of July 2020, the corpus contains two collections amounting to 229 texts. These are: Public data - texts in this collection have been obtained through internet searches and most have been obtained from the FBI Vault. We have transcribed and tagged these texts. Case files from resolved cases - some individuals within AIFL work as forensic linguistic practitioners and in some of their cases they are able to share the data after the case has been resolved. Where texts have been produced in public court cases, or are otherwise in the public domain, they will be provided unanonymised; otherwise case texts will be provided in anonymised form.

Data Owners


The data is stored externally. Please follow the link below for access.