AI HCMay 5, 2021

LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing

Yu Li, Josh Arnold, Feifan Yan, Weiyan Shi, Zhou Yu

arXiv:2105.01992v157.7715 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This toolkit addresses the need for efficient and reproducible human evaluation in dialogue system research, though it is incremental as it builds on existing crowdsourcing methods.

The paper tackles the challenge of evaluating dialogue systems by introducing LEGOEval, an open-source toolkit that simplifies human evaluation via Amazon Mechanical Turk, enabling researchers to reproduce results quickly and consistently with a flexible Python API.

We present LEGOEval, an open-source toolkit that enables researchers to easily evaluate dialogue systems in a few lines of code using the online crowdsource platform, Amazon Mechanical Turk. Compared to existing toolkits, LEGOEval features a flexible task design by providing a Python API that maps to commonly used React.js interface components. Researchers can personalize their evaluation procedures easily with our built-in pages as if playing with LEGO blocks. Thus, LEGOEval provides a fast, consistent method for reproducing human evaluation results. Besides the flexible task design, LEGOEval also offers an easy API to review collected data.

View on arXiv PDF Code

Similar