LOCL: Learning Object-Attribute Composition using Localization
It addresses composition zero-shot learning for computer vision applications, offering a modular approach that enhances performance in complex environments.
The paper tackles the problem of unseen object-attribute associations in cluttered and realistic scenes, achieving a 12% improvement over state-of-the-art methods on challenging datasets.
This paper describes LOCL (Learning Object Attribute Composition using Localization) that generalizes composition zero shot learning to objects in cluttered and more realistic settings. The problem of unseen Object Attribute (OA) associations has been well studied in the field, however, the performance of existing methods is limited in challenging scenes. In this context, our key contribution is a modular approach to localizing objects and attributes of interest in a weakly supervised context that generalizes robustly to unseen configurations. Localization coupled with a composition classifier significantly outperforms state of the art (SOTA) methods, with an improvement of about 12% on currently available challenging datasets. Further, the modularity enables the use of localized feature extractor to be used with existing OA compositional learning methods to improve their overall performance.