Learning multimodal representations for sample-efficient recognition of human actions
This work addresses the challenge of sample-efficient human action recognition for artificial agents in household environments, presenting an incremental improvement with a novel multimodal representation.
The paper tackled the problem of representing human actions for artificial agents by introducing 'motion concepts', a multimodal representation combining kinematics, location, and objects, and developed the OMCL algorithm for learning and recognizing these concepts. OMCL outperformed standard motion recognition algorithms in a one-shot recognition task in a virtual-reality household environment, demonstrating sample-efficient recognition.
Humans interact in rich and diverse ways with the environment. However, the representation of such behavior by artificial agents is often limited. In this work we present \textit{motion concepts}, a novel multimodal representation of human actions in a household environment. A motion concept encompasses a probabilistic description of the kinematics of the action along with its contextual background, namely the location and the objects held during the performance. Furthermore, we present Online Motion Concept Learning (OMCL), a new algorithm which learns novel motion concepts from action demonstrations and recognizes previously learned motion concepts. The algorithm is evaluated on a virtual-reality household environment with the presence of a human avatar. OMCL outperforms standard motion recognition algorithms on an one-shot recognition task, attesting to its potential for sample-efficient recognition of human actions.