Turbo Learning Framework for Human-Object Interactions Recognition and Human Pose Estimation
This work addresses the need for improved accuracy in computer vision tasks for applications like surveillance or robotics, but it is incremental as it builds on existing methods for related tasks.
The paper tackled the joint problem of human-object interactions recognition and human pose estimation by proposing a turbo learning framework that iteratively passes messages between the two tasks, achieving state-of-the-art performance on V-COCO and HICO-DET benchmarks.
Human-object interactions (HOI) recognition and pose estimation are two closely related tasks. Human pose is an essential cue for recognizing actions and localizing the interacted objects. Meanwhile, human action and their interacted objects' localizations provide guidance for pose estimation. In this paper, we propose a turbo learning framework to perform HOI recognition and pose estimation simultaneously. First, two modules are designed to enforce message passing between the tasks, i.e. pose aware HOI recognition module and HOI guided pose estimation module. Then, these two modules form a closed loop to utilize the complementary information iteratively, which can be trained in an end-to-end manner. The proposed method achieves the state-of-the-art performance on two public benchmarks including Verbs in COCO (V-COCO) and HICO-DET datasets.