Dear Sir or Madam, May I introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer
This work addresses a data scarcity problem for researchers in natural language processing, but it is incremental as it focuses on building a dataset rather than proposing new methods.
The authors tackled the lack of datasets and benchmarks in style transfer by creating the largest corpus for formality style transfer, demonstrating that machine translation techniques serve as strong baselines for future research.
Style transfer is the task of automatically transforming a piece of text in one particular style into another. A major barrier to progress in this field has been a lack of training and evaluation datasets, as well as benchmarks and automatic metrics. In this work, we create the largest corpus for a particular stylistic transfer (formality) and show that techniques from the machine translation community can serve as strong baselines for future work. We also discuss challenges of using automatic metrics.