ROOct 30, 2018

Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots

arXiv:1810.12541v1275 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of enabling humanoid robots to produce natural gestures during speech, reducing reliance on manual rule-based systems, though it is incremental as it builds on existing learning-based approaches.

The authors tackled the problem of generating co-speech gestures for robots by developing an end-to-end neural network model trained on 52 hours of TED talks, which successfully produced various gesture types and was rated as human-like and speech-matching in subjective evaluations.

Co-speech gestures enhance interaction experiences between humans as well as between humans and robots. Existing robots use rule-based speech-gesture association, but this requires human labor and prior knowledge of experts to be implemented. We present a learning-based co-speech gesture generation that is learned from 52 h of TED talks. The proposed end-to-end neural network model consists of an encoder for speech text understanding and a decoder to generate a sequence of gestures. The model successfully produces various gestures including iconic, metaphoric, deictic, and beat gestures. In a subjective evaluation, participants reported that the gestures were human-like and matched the speech content. We also demonstrate a co-speech gesture with a NAO robot working in real time.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes