RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text
This addresses the challenge of assessing NLG system quality and human perception, though it appears incremental as a tool demonstration.
The paper tackles the problem of evaluating human detection of machine-generated text by introducing RoFT, a website that invites users to detect machine-generated text in various domains, showing preliminary results for news articles.
In recent years, large neural networks for natural language generation (NLG) have made leaps and bounds in their ability to generate fluent text. However, the tasks of evaluating quality differences between NLG systems and understanding how humans perceive the generated text remain both crucial and difficult. In this system demonstration, we present Real or Fake Text (RoFT), a website that tackles both of these challenges by inviting users to try their hand at detecting machine-generated text in a variety of domains. We introduce a novel evaluation task based on detecting the boundary at which a text passage that starts off human-written transitions to being machine-generated. We show preliminary results of using RoFT to evaluate detection of machine-generated news articles.