GPT-4 for Occlusion Order Recovery
This addresses occlusion challenges in vision models for improved image understanding, but it is incremental as it applies an existing model to a new task.
The paper tackled the problem of occlusion order recovery in dense real-world images by leveraging GPT-4 with a designed prompt to predict occlusion relationships, achieving more accurate predictions in a zero-shot fashion on COCOA and InstaOrder datasets.
Occlusion remains a significant challenge for current vision models to robustly interpret complex and dense real-world images and scenes. To address this limitation and to enable accurate prediction of the occlusion order relationship between objects, we propose leveraging the advanced capability of a pre-trained GPT-4 model to deduce the order. By providing a specifically designed prompt along with the input image, GPT-4 can analyze the image and generate order predictions. The response can then be parsed to construct an occlusion matrix which can be utilized in assisting with other occlusion handling tasks and image understanding. We report the results of evaluating the model on COCOA and InstaOrder datasets. The results show that by using semantic context, visual patterns, and commonsense knowledge, the model can produce more accurate order predictions. Unlike baseline methods, the model can reason about occlusion relationships in a zero-shot fashion, which requires no annotated training data and can easily be integrated into occlusion handling frameworks.