Pose Forecasting in Industrial Human-Robot Collaboration
This work addresses safety and efficiency in industrial settings by enabling collaborative robots to predict human poses and detect collisions, though it is incremental with a novel method for a known bottleneck.
The paper tackles human pose forecasting for industrial human-robot collaboration by proposing a Separable-Sparse Graph Convolutional Network (SeS-GCN), which reduces parameters by 98.28% and speeds up inference by ~4 times while maintaining comparable accuracy on Human3.6M. It also introduces a new benchmark dataset (CHICO) and demonstrates SeS-GCN's performance with an average error of 85.3 mm for pose forecasting and an F1-score of 0.64 for collision detection.
Pushing back the frontiers of collaborative robots in industrial environments, we propose a new Separable-Sparse Graph Convolutional Network (SeS-GCN) for pose forecasting. For the first time, SeS-GCN bottlenecks the interaction of the spatial, temporal and channel-wise dimensions in GCNs, and it learns sparse adjacency matrices by a teacher-student framework. Compared to the state-of-the-art, it only uses 1.72% of the parameters and it is ~4 times faster, while still performing comparably in forecasting accuracy on Human3.6M at 1 second in the future, which enables cobots to be aware of human operators. As a second contribution, we present a new benchmark of Cobots and Humans in Industrial COllaboration (CHICO). CHICO includes multi-view videos, 3D poses and trajectories of 20 human operators and cobots, engaging in 7 realistic industrial actions. Additionally, it reports 226 genuine collisions, taking place during the human-cobot interaction. We test SeS-GCN on CHICO for two important perception tasks in robotics: human pose forecasting, where it reaches an average error of 85.3 mm (MPJPE) at 1 sec in the future with a run time of 2.3 msec, and collision detection, by comparing the forecasted human motion with the known cobot motion, obtaining an F1-score of 0.64.