CL AI DS LGNov 6, 2025

A Characterization of List Language Identification in the Limit

Moses Charikar, Chirag Pabbaraju, Ambuj Tewari

arXiv:2511.04103v110.96 citationsh-index: 9

Originality Highly original

AI Analysis

This work addresses a foundational problem in computational learning theory for researchers, offering a novel extension to classic impossibility results with practical implications for list-based learning.

The paper tackles the problem of language identification in the limit by extending it to allow learners to output a list of k guesses, providing an exact characterization of language collections that can be k-list identified, and showing that such collections can be identified at an exponential rate in a statistical setting.

We study the problem of language identification in the limit, where given a sequence of examples from a target language, the goal of the learner is to output a sequence of guesses for the target language such that all the guesses beyond some finite time are correct. Classical results of Gold showed that language identification in the limit is impossible for essentially any interesting collection of languages. Later, Angluin gave a precise characterization of language collections for which this task is possible. Motivated by recent positive results for the related problem of language generation, we revisit the classic language identification problem in the setting where the learner is given the additional power of producing a list of $k$ guesses at each time step. The goal is to ensure that beyond some finite time, one of the guesses is correct at each time step. We give an exact characterization of collections of languages that can be $k$-list identified in the limit, based on a recursive version of Angluin's characterization (for language identification with a list of size $1$). This further leads to a conceptually appealing characterization: A language collection can be $k$-list identified in the limit if and only if the collection can be decomposed into $k$ collections of languages, each of which can be identified in the limit (with a list of size $1$). We also use our characterization to establish rates for list identification in the statistical setting where the input is drawn as an i.i.d. stream from a distribution supported on some language in the collection. Our results show that if a collection is $k$-list identifiable in the limit, then the collection can be $k$-list identified at an exponential rate, and this is best possible. On the other hand, if a collection is not $k$-list identifiable in the limit, then it cannot be $k$-list identified at any rate that goes to zero.

View on arXiv PDF

Similar