CLMar 17, 2018

Dear Sir or Madam, May I introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer

arXiv:1803.06535v21206 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a data scarcity problem for researchers in natural language processing, but it is incremental as it focuses on building a dataset rather than proposing new methods.

The authors tackled the lack of datasets and benchmarks in style transfer by creating the largest corpus for formality style transfer, demonstrating that machine translation techniques serve as strong baselines for future research.

Style transfer is the task of automatically transforming a piece of text in one particular style into another. A major barrier to progress in this field has been a lack of training and evaluation datasets, as well as benchmarks and automatic metrics. In this work, we create the largest corpus for a particular stylistic transfer (formality) and show that techniques from the machine translation community can serve as strong baselines for future work. We also discuss challenges of using automatic metrics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes