CVLGMLFeb 5, 2024

Enhancing Compositional Generalization via Compositional Feature Alignment

arXiv:2402.02851v24 citationsh-index: 6Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This addresses a key challenge in real-world ML applications with data distribution shifts, though it is incremental as it builds on existing pretraining-finetuning paradigms.

The paper tackles the problem of compositional generalization in machine learning models, where models struggle with unseen domain-class combinations, by proposing Compositional Feature Alignment (CFA), a two-stage finetuning technique that improves performance on CG-Bench benchmarks for CLIP and DINOv2 models.

Real-world applications of machine learning models often confront data distribution shifts, wherein discrepancies exist between the training and test data distributions. In the common multi-domain multi-class setup, as the number of classes and domains scales up, it becomes infeasible to gather training data for every domain-class combination. This challenge naturally leads the quest for models with Compositional Generalization (CG) ability, where models can generalize to unseen domain-class combinations. To delve into the CG challenge, we develop CG-Bench, a suite of CG benchmarks derived from existing real-world image datasets, and observe that the prevalent pretraining-finetuning paradigm on foundational models, such as CLIP and DINOv2, struggles with the challenge. To address this challenge, we propose Compositional Feature Alignment (CFA), a simple two-stage finetuning technique that i) learns two orthogonal linear heads on a pretrained encoder with respect to class and domain labels, and ii) fine-tunes the encoder with the newly learned head frozen. We theoretically and empirically justify that CFA encourages compositional feature learning of pretrained models. We further conduct extensive experiments on CG-Bench for CLIP and DINOv2, two powerful pretrained vision foundation models. Experiment results show that CFA outperforms common finetuning techniques in compositional generalization, corroborating CFA's efficacy in compositional feature learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes