Deceptive Opinion Spam Corpus v1.4


Uploaded: 2022-06-08
Languages: English
Collected from: 2011
Access category: External
To: 2013

Summary

Corpus of truthful and deceptive hotel reviews of 20 Chicago hotels


Subject keywords: deceptive communication, corpus linguistics, Tripadvisor reviews
Data types: Written
Funders: N/A
Associated AIFL centres: None
License: CC0 (The Creative Commons Public Domain Dedication)

Description

This corpus contains: 400 truthful positive reviews from TripAdvisor (described in [1]) 400 deceptive positive reviews from Mechanical Turk (described in [1]) 400 truthful negative reviews from Expedia, Hotels.com, Orbitz, Priceline, TripAdvisor and Yelp (described in [2]) 400 deceptive negative reviews from Mechanical Turk (described in [2]) Each of the above datasets consist of 20 reviews for each of the 20 most popular Chicago hotels (see [1] for more details). The files are named according to the following conventions: Directories prefixed with fold correspond to a single fold from the cross-validation experiments reported in [1] and [2]. Files are named according to the format %c_%h_%i.txt, where: %c denotes the class: (t)ruthful or (d)eceptive %h denotes the hotel: affinia: Affinia Chicago (now MileNorth, A Chicago Hotel) allegro: Hotel Allegro Chicago - a Kimpton Hotel amalfi: Amalfi Hotel Chicago ambassador: Ambassador East Hotel (now PUBLIC Chicago) conrad: Conrad Chicago fairmont: Fairmont Chicago Millennium Park hardrock: Hard Rock Hotel Chicago hilton: Hilton Chicago homewood: Homewood Suites by Hilton Chicago Downtown hyatt: Hyatt Regency Chicago intercontinental: InterContinental Chicago james: James Chicago knickerbocker: Millennium Knickerbocker Hotel Chicago monaco: Hotel Monaco Chicago - a Kimpton Hotel omni: Omni Chicago Hotel palmer: The Palmer House Hilton sheraton: Sheraton Chicago Hotel and Towers sofitel: Sofitel Chicago Water Tower swissotel: Swissotel Chicago talbott: The Talbott Hotel %i serves as a counter to make the filename unique References [1] M. Ott, Y. Choi, C. Cardie, and J.T. Hancock. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. [2] M. Ott, C. Cardie, and J.T. Hancock. 2013. Negative Deceptive Opinion Spam. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.


Data Owners


Link

The data is stored externally. Please follow the link below for access.