HCGRMMFeb 23, 2021

A large, crowdsourced evaluation of gesture generation systems on common data: The GENEA Challenge 2020

arXiv:2102.11617v187 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of inconsistent evaluation in gesture generation for embodied conversational agents, providing a standardized benchmark for researchers, though it is incremental as it focuses on evaluation rather than new methods.

The paper tackled the lack of comparability in co-speech gesture generation research by launching the GENEA Challenge, where teams built systems on a common dataset and were evaluated in a large crowdsourced study, enabling benchmarking of state-of-the-art methods.

Co-speech gestures, gestures that accompany speech, play an important role in human communication. Automatic co-speech gesture generation is thus a key enabling technology for embodied conversational agents (ECAs), since humans expect ECAs to be capable of multi-modal communication. Research into gesture generation is rapidly gravitating towards data-driven methods. Unfortunately, individual research efforts in the field are difficult to compare: there are no established benchmarks, and each study tends to use its own dataset, motion visualisation, and evaluation methodology. To address this situation, we launched the GENEA Challenge, a gesture-generation challenge wherein participating teams built automatic gesture-generation systems on a common dataset, and the resulting systems were evaluated in parallel in a large, crowdsourced user study using the same motion-rendering pipeline. Since differences in evaluation outcomes between systems now are solely attributable to differences between the motion-generation methods, this enables benchmarking recent approaches against one another in order to get a better impression of the state of the art in the field. This paper reports on the purpose, design, results, and implications of our challenge.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes