Tripadvisor Corpus (Peninsular and Mexican Spanish speakers)


Uploaded: 2021-02-16
Languages: English
Collected from: 2019-10-01
Access category: Open
Email: Not available
To: 2020-06-01

Summary

The corpus includes two subcorpora, one with L2 English data produced by L1 Mexican Spanish speakers (word count: 37,500), and one with L2 English data produced by L1 Peninsular Spanish speakers (word count: 40,602).


Subject keywords: NLID, Typology, Language variation, Mexican Spanish, Peninsular Spanish
Data types: Written
Funders: N/A
Associated AIFL centres: Centre for Forensic Text Analysis (FTA)
License: CC0 (The Creative Commons Public Domain Dedication)

Description

We compiled two corpora, one with L2 English data produced by L1 Mexican Spanish speakers and another with L2 English data produced by L1 Peninsular Spanish speakers (word count: 40,602). To ensure that the authors were native speakers of their respective dialects, we selected posts asserting their linguistic identity (e.g. “I am / I’m Mexican / Spanish / from Mexico / from Spain”) written by users with a geolocator in their account. Additionally, we made sure that the participants had a native-like command of Spanish by checking their other posts. The corpus was compiled from the most recent 25 entries per author, or all of them if less.In total, there are 504 entries in the Peninsular Spanish corpus and 514 entries in the Mexican Spanish corpus, with 37 different authors for the Mexican Spanish corpus and 38 different authors for the Peninsular Spanish corpus. Diatopically, the authors come from different parts of Mexico but only from regions in Spain where Peninsular Spanish is not contact with other languages (i.e. Central and Southern Spain).


Data Donors


Files

Here are the files submitted for this Item.