CLSep 12, 2019

VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

Changhan Wang, Anirudh Jain, Danlu Chen, Jiatao Gu

arXiv:1909.05424v11004 citations

Originality Synthesis-oriented

AI Analysis

This toolkit addresses the need for more interpretable and detailed evaluation in text generation for researchers and practitioners, though it is incremental as it builds on existing metrics.

The paper tackles the problem of evaluating text generation tasks by introducing VizSeq, a visual analysis toolkit that provides instance-level and corpus-level system evaluation with support for multimodal sources and multiple references, offering visualization in Jupyter notebooks or web apps.

Automatic evaluation of text generation tasks (e.g. machine translation, text summarization, image captioning and video description) usually relies heavily on task-specific metrics, such as BLEU and ROUGE. They, however, are abstract numbers and are not perfectly aligned with human assessment. This suggests inspecting detailed examples as a complement to identify system error patterns. In this paper, we present VizSeq, a visual analysis toolkit for instance-level and corpus-level system evaluation on a wide variety of text generation tasks. It supports multimodal sources and multiple text references, providing visualization in Jupyter notebook or a web app interface. It can be used locally or deployed onto public servers for centralized data hosting and benchmarking. It covers most common n-gram based metrics accelerated with multiprocessing, and also provides latest embedding-based metrics such as BERTScore.

View on arXiv PDF

Similar