Greedy Grammar Induction with Indirect Negative Evidence
This work addresses grammar induction for computational linguistics, presenting an incremental method with theoretical efficiency bounds.
The paper tackles the problem of learning Context Free Grammars by introducing a greedy search learner based on indirect negative evidence, which defines a hierarchy of learnable classes and shows that efficiency depends on the target grammar's position in this hierarchy and input richness.
This paper offers a fresh look at the pumping lemma constant as an upper bound on the information required for learning Context Free Grammars. An objective function based on indirect negative evidence considers the occurrences, and non-occurrences, of a finite number of strings, encountered after a sufficiently long presentation. This function has optimal substructure in the hypotheses space, giving rise to a greedy search learner in a branch and bound method. A hierarchy of learnable classes is defined in terms of the number of production rules that must be added to interim solutions in order to incrementally fit the input. Efficiency strongly depends on the position of the target grammar in the hierarchy and on the richness of the input.