CycleHOI: Improving Human-Object Interaction Detection with Cycle Consistency of Detection and Generation
This work addresses the challenge of accurately detecting human-object interactions in computer vision, which is crucial for applications like robotics and surveillance, though it appears incremental as it builds on existing detection frameworks.
The paper tackles the problem of human-object interaction detection by proposing CycleHOI, a framework that bridges DETR-based detection with text-to-image diffusion models through cycle consistency loss and feature distillation. The result shows significant performance improvements on HICO-DET and V-COCO datasets, enhancing state-of-the-art HOI detectors.
Recognition and generation are two fundamental tasks in computer vision, which are often investigated separately in the exiting literature. However, these two tasks are highly correlated in essence as they both require understanding the underline semantics of visual concepts. In this paper, we propose a new learning framework, coined as CycleHOI, to boost the performance of human-object interaction (HOI) detection by bridging the DETR-based detection pipeline and the pre-trained text-to-image diffusion model. Our key design is to introduce a novel cycle consistency loss for the training of HOI detector, which is able to explicitly leverage the knowledge captured in the powerful diffusion model to guide the HOI detector training. Specifically, we build an extra generation task on top of the decoded instance representations from HOI detector to enforce a detection-generation cycle consistency. Moreover, we perform feature distillation from diffusion model to detector encoder to enhance its representation power. In addition, we further utilize the generation power of diffusion model to augment the training set in both aspects of label correction and sample generation. We perform extensive experiments to verify the effectiveness and generalization power of our CycleHOI with three HOI detection frameworks on two public datasets: HICO-DET and V-COCO. The experimental results demonstrate our CycleHOI can significantly improve the performance of the state-of-the-art HOI detectors.