Computer-mediated-conversations in an online platform between known and unknown authors

Uploaded: 2021-02-08
Languages: Greek
Collected from: 2020-07-10
Access category: Open
To: 2020-08-05


Datasets consist in online conversations via Messenger between the researcher and participants. A forensic scenario was created: the first sample of each conversation is by known author-participant (day 1 and 2) and the second sample is of unknown authorship (day 3), resulting in an appropriate linguistic situation for authorship analysis.

Subject keywords: Computer-Mediated Discourse, Authorship Analysis, Online Instant Messaging, Stylistics
Data types: Written
Funders: N/A
Associated AIFL centres: Forensic Linguistic Databank (FoLD)
License: CC0 (The Creative Commons Public Domain Dedication)


The data collection were specifically designed to extract as natural-occurring computer-mediated conversations as possible via a role-play simulation activity in three-day sessions at “Messenger”, a widely known and commonly used platform for everyday online exchange of messages. Every dataset is characterized by certain features. Each conversation is an informal computer-mediated-communication between two people, who share an intimate, friendly relationship to eliminate a possible power imbalance and formal discourse conventions. The data were extracted by a spontaneous exchange of messages and certainly a non-stimulus and non-triggered process was adopted. Since the datasets were collected for the creation of a simulation forensic scenario appropriate of authorship analysis, the data respect the forensic realistic conditions and produce informal conversations, free of written discourse conventions that function as a flourishing ground for stylistic manifestations to blossom. It needs to be mentioned that the genre of instant online messaging includes variations within the norm as well as deviations from it that encourage more idiosyncratic stylistic manifestations. The data that I contribute consist of five conversations with me as a researcher and one participant per conversation. Each dataset is further divided in three-day sessions where at the first day the participant is of known authorship while at day three the participant is of unknown authorship, resulting therefore in a forensic scenario appropriate for authorship analysis. The researcher is always the same person of known authorship. The structure of the datasets is a typical computer mediated exchange of messages with turns between the researcher and the participant. The researcher begins and ends every conversation in every session but the adjacency pairs inside the conversation have occurred naturally. It needs to be noted that the translation of researcher in Greek is “Ερευνητής” and of the participant “Συμμετέχων”, therefore each conversational pair is marked with their initials “Ε” and “Σ” respectively. Furthermore, emoticons and further CMD (Computer-mediated-discourse) reactions are described inside parentheses adequately. The size of the datasets fluctuates from 1.556 to 2.011 words per conversation, resulting in a total amount of 8.659 words. Finally, all names of the participants have been replaced by other names of the same origin/language to ensure total anonymity and preserve the stylistic information that each name reference bears.

