Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition
This work addresses a specific bottleneck for cobots in Industry 4.0 by enhancing action recognition in assembly tasks, though it appears incremental as it builds on existing skeleton-based methods.
The paper tackles the problem of skeleton-based action recognition losing keypoint semantics in complex interactions by introducing a method that uses word embeddings to encode semantic information, resulting in significant improvements in classification performance and generalization across multiple assembly datasets.
Effective human action recognition is widely used for cobots in Industry 4.0 to assist in assembly tasks. However, conventional skeleton-based methods often lose keypoint semantics, limiting their effectiveness in complex interactions. In this work, we introduce a novel approach to skeleton-based action recognition that enriches input representations by leveraging word embeddings to encode semantic information. Our method replaces one-hot encodings with semantic volumes, enabling the model to capture meaningful relationships between joints and objects. Through extensive experiments on multiple assembly datasets, we demonstrate that our approach significantly improves classification performance, and enhances generalization capabilities by simultaneously supporting different skeleton types and object classes. Our findings highlight the potential of incorporating semantic information to enhance skeleton-based action recognition in dynamic and diverse environments.