Predictive Reasoning with Augmented Anomaly Contrastive Learning for Compositional Visual Relations
This work addresses the relatively unexplored challenge of compositional visual relations for visual reasoning, offering a novel method that could enhance AI systems in tasks requiring complex relational understanding.
The paper tackles the problem of compositional visual relations (CVR) by proposing PR-A^2CL, which uses augmented anomaly contrastive learning and a predict-and-verify paradigm to identify outlier images, achieving significant performance improvements over state-of-the-art models on SVRT, CVR, and MC^2R datasets.
While visual reasoning for simple analogies has received significant attention, compositional visual relations (CVR) remain relatively unexplored due to their greater complexity. To solve CVR tasks, we propose Predictive Reasoning with Augmented Anomaly Contrastive Learning (PR-A$^2$CL), \ie, to identify an outlier image given three other images that follow the same compositional rules. To address the challenge of modelling abundant compositional rules, an Augmented Anomaly Contrastive Learning is designed to distil discriminative and generalizable features by maximizing similarity among normal instances while minimizing similarity between normal and anomalous outliers. More importantly, a predict-and-verify paradigm is introduced for rule-based reasoning, in which a series of Predictive Anomaly Reasoning Blocks (PARBs) iteratively leverage features from three out of the four images to predict those of the remaining one. Throughout the subsequent verification stage, the PARBs progressively pinpoint the specific discrepancies attributable to the underlying rules. Experimental results on SVRT, CVR and MC$^2$R datasets show that PR-A$^2$CL significantly outperforms state-of-the-art reasoning models.