CorIELLS (CORpus of Italian and English Legal Lay textS)


Uploaded: 2021-02-08
Languages: English, Italian
Collected from: 2019-11-01
Access category: Open
To: 2020-12-01

Summary

Corpus of legal lay documents in English and Italian. The corpus includes two subcorpora, one for Italian and one for English documents. It is designed to be partially parallel and partially idiosyncratic.


Subject keywords: legal lay language, bilingual corpus, corpus linguistics
Data types: Written
Funders: N/A
Associated AIFL centres: Forensic Linguistic Databank (FoLD)
License: non-commercial

Description

The corpus was collected throughout 2020, combining semi-automatic and manual searches via the Bootcat toolkit. The final selection comprises different textual types which all pertain to the general definition of "legal lay language", i.e. legal texts which share the intent of being read – and most critically, understood – by a non-specialized audience. The corpus includes: 1. 247 summaries of European legislation per language (all summaries published in 2019). 2. 45 terms of use of websites per language (taken from the Alexa 500 most clicked websites in Italy and Uk for 2019) 3. 15 standard bank contracts per language (only documents freely accessible online) 4. 15 utilities contracts per language (5 phone, 5 gas and electricity, 5 Internet providers. Only documents freely accessible online). Part 1 and 2 are designed to be parallel in the two languages, while part 3 and 4 are independent for each language. The Italian subcorpus amounts to 1M tokens, while the English subcorpus amounts to 800K tokens. associated publications: Busso, L. (2021). CorIELLS: A specialised bilingual corpus of lay legal communication Busso, L. (forthcoming). Lexicon and Grammar in Legal-Lay Language: a Quantitative Corpus Study on Italian. To appear in: Studi Italiani di Linguistica Teorica e Applicata (SILTA), 2022.


Data Donors


Files

Here are the files submitted for this Item.