factgenie: A Framework for Span-based Evaluation of Generated Texts
This provides a tool for researchers and practitioners to assess text generation models, but it is incremental as it builds on existing span-based evaluation methods.
The authors introduced factgenie, a framework for annotating and visualizing word spans in generated texts to evaluate phenomena like semantic inaccuracies, enabling data collection from both human crowdworkers and large language models.
We present factgenie: a framework for annotating and visualizing word spans in textual model outputs. Annotations can capture various span-based phenomena such as semantic inaccuracies or irrelevant text. With factgenie, the annotations can be collected both from human crowdworkers and large language models. Our framework consists of a web interface for data visualization and gathering text annotations, powered by an easily extensible codebase.