NUBIA: NeUral Based Interchangeability Assessor for Text Generation
This provides a modular and explainable evaluation tool for researchers and practitioners in natural language processing, though it is incremental as it builds on existing neural approaches.
The authors tackled the problem of automatic evaluation for text generation by introducing NUBIA, a neural-based methodology that outperforms existing metrics in machine translation and summarization, and matches state-of-the-art metrics in correlation with human judgment on tasks like WMT segment-level assessment and image captioning.
We present NUBIA, a methodology to build automatic evaluation metrics for text generation using only machine learning models as core components. A typical NUBIA model is composed of three modules: a neural feature extractor, an aggregator and a calibrator. We demonstrate an implementation of NUBIA which outperforms metrics currently used to evaluate machine translation, summaries and slightly exceeds/matches state of the art metrics on correlation with human judgement on the WMT segment-level Direct Assessment task, sentence-level ranking and image captioning evaluation. The model implemented is modular, explainable and set to continuously improve over time.