AIHCMay 5, 2021

LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing

arXiv:2105.01992v1715 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This toolkit addresses the need for efficient and reproducible human evaluation in dialogue system research, though it is incremental as it builds on existing crowdsourcing methods.

The paper tackles the challenge of evaluating dialogue systems by introducing LEGOEval, an open-source toolkit that simplifies human evaluation via Amazon Mechanical Turk, enabling researchers to reproduce results quickly and consistently with a flexible Python API.

We present LEGOEval, an open-source toolkit that enables researchers to easily evaluate dialogue systems in a few lines of code using the online crowdsource platform, Amazon Mechanical Turk. Compared to existing toolkits, LEGOEval features a flexible task design by providing a Python API that maps to commonly used React.js interface components. Researchers can personalize their evaluation procedures easily with our built-in pages as if playing with LEGO blocks. Thus, LEGOEval provides a fast, consistent method for reproducing human evaluation results. Besides the flexible task design, LEGOEval also offers an easy API to review collected data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes