A large, crowdsourced evaluation of gesture generation systems on common data: The GENEA Challenge 2020
This addresses the problem of inconsistent evaluation in gesture generation for embodied conversational agents, providing a standardized benchmark for researchers, though it is incremental as it focuses on evaluation rather than new methods.
The paper tackled the lack of comparability in co-speech gesture generation research by launching the GENEA Challenge, where teams built systems on a common dataset and were evaluated in a large crowdsourced study, enabling benchmarking of state-of-the-art methods.
Co-speech gestures, gestures that accompany speech, play an important role in human communication. Automatic co-speech gesture generation is thus a key enabling technology for embodied conversational agents (ECAs), since humans expect ECAs to be capable of multi-modal communication. Research into gesture generation is rapidly gravitating towards data-driven methods. Unfortunately, individual research efforts in the field are difficult to compare: there are no established benchmarks, and each study tends to use its own dataset, motion visualisation, and evaluation methodology. To address this situation, we launched the GENEA Challenge, a gesture-generation challenge wherein participating teams built automatic gesture-generation systems on a common dataset, and the resulting systems were evaluated in parallel in a large, crowdsourced user study using the same motion-rendering pipeline. Since differences in evaluation outcomes between systems now are solely attributable to differences between the motion-generation methods, this enables benchmarking recent approaches against one another in order to get a better impression of the state of the art in the field. This paper reports on the purpose, design, results, and implications of our challenge.