Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection
This work addresses the challenge of efficient viewpoint selection in object-centric learning for visual scene understanding, offering an incremental improvement over existing methods.
The paper tackles the problem of improving object-centric representations in multi-viewpoint learning by proposing an active viewpoint selection strategy that predicts images from unknown viewpoints and selects those with the largest disparity in representations to enhance information gain. The result shows significant improvements in segmentation and reconstruction performance compared to random selection, with accurate prediction of unknown viewpoint images.
Given the complexities inherent in visual scenes, such as object occlusion, a comprehensive understanding often requires observation from multiple viewpoints. Existing multi-viewpoint object-centric learning methods typically employ random or sequential viewpoint selection strategies. While applicable across various scenes, these strategies may not always be ideal, as certain scenes could benefit more from specific viewpoints. To address this limitation, we propose a novel active viewpoint selection strategy. This strategy predicts images from unknown viewpoints based on information from observation images for each scene. It then compares the object-centric representations extracted from both viewpoints and selects the unknown viewpoint with the largest disparity, indicating the greatest gain in information, as the next observation viewpoint. Through experiments on various datasets, we demonstrate the effectiveness of our active viewpoint selection strategy, significantly enhancing segmentation and reconstruction performance compared to random viewpoint selection. Moreover, our method can accurately predict images from unknown viewpoints.