LGAISep 9, 2025

ACE and Diverse Generalization via Selective Disagreement

arXiv:2509.07955v1h-index: 4
Originality Incremental advance
AI Analysis

This work addresses the challenge of spurious correlations in machine learning, which is a critical issue for improving model robustness and generalization, particularly in domains like language-model alignment, though it is incremental as it builds on prior methods for handling spurious correlations.

The paper tackles the problem of deep neural networks being sensitive to spurious correlations, especially in cases where these correlations are complete and lead to underspecified generalizations, by proposing ACE, a method that learns a set of concepts through self-training to encourage confident and selective disagreement. The result is that ACE matches or outperforms existing methods on complete-spurious correlation benchmarks and achieves competitive performance on a language-model alignment benchmark without access to untrusted measurements.

Deep neural networks are notoriously sensitive to spurious correlations - where a model learns a shortcut that fails out-of-distribution. Existing work on spurious correlations has often focused on incomplete correlations,leveraging access to labeled instances that break the correlation. But in cases where the spurious correlations are complete, the correct generalization is fundamentally \textit{underspecified}. To resolve this underspecification, we propose learning a set of concepts that are consistent with training data but make distinct predictions on a subset of novel unlabeled inputs. Using a self-training approach that encourages \textit{confident} and \textit{selective} disagreement, our method ACE matches or outperforms existing methods on a suite of complete-spurious correlation benchmarks, while remaining robust to incomplete spurious correlations. ACE is also more configurable than prior approaches, allowing for straight-forward encoding of prior knowledge and principled unsupervised model selection. In an early application to language-model alignment, we find that ACE achieves competitive performance on the measurement tampering detection benchmark \textit{without} access to untrusted measurements. While still subject to important limitations, ACE represents significant progress towards overcoming underspecification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes