Closed-Loop Transfer for Weakly-supervised Affordance Grounding
This addresses the limitation of one-way knowledge transfer in affordance grounding for complex interaction scenarios, though it appears incremental as it builds on existing weakly-supervised methods.
The paper tackles the problem of weakly-supervised affordance grounding, where knowledge from exocentric images is transferred to egocentric images, by introducing LoopTrans, a closed-loop framework that enhances transfer in both directions. It achieves consistent improvements across all metrics on image and video benchmarks, including in challenging scenarios with occluded interaction regions.
Humans can perform previously unexperienced interactions with novel objects simply by observing others engage with them. Weakly-supervised affordance grounding mimics this process by learning to locate object regions that enable actions on egocentric images, using exocentric interaction images with image-level annotations. However, extracting affordance knowledge solely from exocentric images and transferring it one-way to egocentric images limits the applicability of previous works in complex interaction scenarios. Instead, this study introduces LoopTrans, a novel closed-loop framework that not only transfers knowledge from exocentric to egocentric but also transfers back to enhance exocentric knowledge extraction. Within LoopTrans, several innovative mechanisms are introduced, including unified cross-modal localization and denoising knowledge distillation, to bridge domain gaps between object-centered egocentric and interaction-centered exocentric images while enhancing knowledge transfer. Experiments show that LoopTrans achieves consistent improvements across all metrics on image and video benchmarks, even handling challenging scenarios where object interaction regions are fully occluded by the human body.