Adversarial Evaluation for Models of Natural Language
This work addresses a methodological gap in NLP evaluation, offering a structured way to improve model assessment, though it is incremental as it builds on existing evaluation concepts.
The paper tackles the problem of evaluating natural language processing models, particularly unsupervised ones, by proposing a new adversarial framework that clarifies roles and encourages error analysis, resulting in a versatile approach that can simulate both existing and new evaluation types.
We now have a rich and growing set of modeling tools and algorithms for inducing linguistic structure from text that is less than fully annotated. In this paper, we discuss some of the weaknesses of our current methodology. We present a new abstract framework for evaluating natural language processing (NLP) models in general and unsupervised NLP models in particular. The central idea is to make explicit certain adversarial roles among researchers, so that the different roles in an evaluation are more clearly defined and performers of all roles are offered ways to make measurable contributions to the larger goal. Adopting this approach may help to characterize model successes and failures by encouraging earlier consideration of error analysis. The framework can be instantiated in a variety of ways, simulating some familiar intrinsic and extrinsic evaluations as well as some new evaluations.