LGJun 9, 2024

PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning

arXiv:2406.06633v128 citations
Originality Incremental advance
AI Analysis

This addresses a bias issue in robust model training for machine learning practitioners, but it is incremental as it builds on existing CAD methods.

The paper tackles the problem that training with Counterfactually Augmented Data (CAD) can cause models to overfocus on modified features and underperform on out-of-distribution datasets, and it proposes using contrastive learning to enhance feature alignment, resulting in state-of-the-art performance on OOD datasets as demonstrated in experiments.

Counterfactually Augmented Data (CAD) involves creating new data samples by applying minimal yet sufficient modifications to flip the label of existing data samples to other classes. Training with CAD enhances model robustness against spurious features that happen to correlate with labels by spreading the casual relationships across different classes. Yet, recent research reveals that training with CAD may lead models to overly focus on modified features while ignoring other important contextual information, inadvertently introducing biases that may impair performance on out-ofdistribution (OOD) datasets. To mitigate this issue, we employ contrastive learning to promote global feature alignment in addition to learning counterfactual clues. We theoretically prove that contrastive loss can encourage models to leverage a broader range of features beyond those modified ones. Comprehensive experiments on two human-edited CAD datasets demonstrate that our proposed method outperforms the state-of-the-art on OOD datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes