OPDMulti: Openable Part Detection for Multiple Objects
This work addresses a more realistic scenario for robotics and computer vision applications, though it is incremental in extending from single to multiple objects.
The paper tackles the problem of detecting openable parts in images with multiple objects, rather than the unrealistic single-object setting, and introduces OPDFormer, which significantly outperforms prior methods.
Openable part detection is the task of detecting the openable parts of an object in a single-view image, and predicting corresponding motion parameters. Prior work investigated the unrealistic setting where all input images only contain a single openable object. We generalize this task to scenes with multiple objects each potentially possessing openable parts, and create a corresponding dataset based on real-world scenes. We then address this more challenging scenario with OPDFormer: a part-aware transformer architecture. Our experiments show that the OPDFormer architecture significantly outperforms prior work. The more realistic multiple-object scenarios we investigated remain challenging for all methods, indicating opportunities for future work.