Contrastive Identification and Generation in the Limit

Xiaoyu Li, Andi Han, Jiaojiao Jiang, Junbin Gao

arXiv:2605.0621186.1

AI Analysis

This work provides foundational theoretical results for learning from relational (contrastive) supervision in the limit, which is relevant to computational learning theory and may impact practical contrastive learning methods.

This paper extends the classical identification/generation in the limit models to contrastive learning, where the learner receives unordered pairs of examples with opposite labels. The authors characterize contrastive identifiability with a geometric refinement of Angluin's tell-tale condition, define a contrastive closure dimension for generation with tight sample complexity, and show a strict hierarchy between contrastive generation and text identification. Under finite adversarial corruption, they prove a sharp reversal: some classes are identifiable from contrastive pairs under any finite corruption budget but not from positive examples even with one corrupted observation.

In the classical identification in the limit model of Gold [1967], a stream of positive examples is presented round by round, and the learner must eventually recover the target hypothesis. Recently, Kleinberg and Mullainathan [2024] introduced generation in the limit, where the learner instead must eventually output novel elements of the target's support. Both lines of work focus on positive-only or fully labeled data. Yet many natural supervision signals are inherently relational rather than singleton, which encode relationships between examples rather than labels of individual ones. We initiate the study of contrastive identification and generation in the limit, where the learner observes a contrastive presentation of data: a stream of unordered pairs $\{x,y\}$ satisfying $h(x)\ne h(y)$ for an unknown target binary hypothesis $h$, but which element is positive is hidden from the learner. We first present three results in the noiseless setting: an exact characterization of contrastive identifiable classes (a one-line geometric refinement of Angluin [1980]'s tell-tale condition), a combinatorial dimension called contrastive closure dimension (a contrasitive analogue of the closure dimension in Raman et al. [2025]) and exactly characterizing uniform contrastive generation with tight sample complexity, and a strict hierarchy in which contrastive generation and text identification are mutually incomparable. We then prove a sharp reversal under finite adversarial corruption: there exist classes identifiable from contrastive pairs under any finite corruption budget by a single budget-independent algorithm, yet not identifiable from positive examples under even one corrupted observation. The unifying technical object is the common crossing graph, which encodes pairwise ambiguity, family-level generation obstructions, and corruption defects in a single coverage-and-incidence language.

View on arXiv PDF

Similar