An Update to the Minho Quotation Resource
This is an incremental update to a domain-specific dataset for researchers analyzing business communication.
The authors updated the Minho Quotation Resource by standardizing job titles, imputing missing data, removing duplicates, and adding metaphor/simile extraction and emotion analysis, aiming to improve its usability for studying business communication during the 2008-2012 financial crisis.
The Minho Quotation Resource was originally released in 2012. It provided approximately 500,000 quotes from business leaders, analysts and politicians that spanned the period from 2008 to 2012. The original resource had several failings which include a large number of missing job titles and affiliations as well as unnormalised job titles which produced a large variation in spellings and formats of the same employment position. Also, there were numerous duplicate posts. This update has standardised the job title text as well as the imputation of missing job titles and affiliations. Duplicate quotes have been deleted. This update also provides some metaphor and simile extraction as well as an emotion distribution of the quotes. This update has also replaced an antiquated version of Lucene index with a JSONL format as well as a rudimentary interface that can query the data supplied with the resource. It is hoped that this update will encourage the study of business communication in a time of a financial crisis.