The 100 idiolects project

Uploaded: 2021-09-08
Languages: English
Collected from: 2020
Access category: Restricted
To: 2021


A multichannel corpus of 100 English speakers

Subject keywords: multimodal corpus, idiolect, genre analysis
Data types: Written, Spoken - transcript
Funders: Research England (UKRI)
Associated AIFL centres: Centre for Forensic Text Analysis (FTA)
License: N/A


This project uses language data from 100+ individuals (undergraduate students at Aston University) in different media and contexts, for different purposes and audiences. 112 individuals have provided samples for the following discourse types: oral interview, university essay, email, text message, image description with speech-to-text software, business memo. In addition, 66 of the 112 individuals provided a handwritten text. All interviews and image description recordings transcribed by the same person for consistency. Full verbatim transcripts include hesitations, false starts, non-standard grammar, etc. All data: XML tags for identifying information for retention of context.

Data Donors


Information: This dataset contains sensitive material or data that come from a third party and have some constraints on access and use. Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

