Takayuki Kuriyama

82.5FLMay 8

Distributional Learning of Context-Free Languages under Fixed Finite-Monoid Typing

Takayuki Kuriyama

We study distributional learning of context-free languages under a fixed recognizable congruence $\sim_h$ given as the kernel of an explicit finite monoid homomorphism $h:Σ^*\to M$. For this fixed-$h$ setting, we develop a finite typed reconstruction theory for context-free $\sim_h$-substitutable languages. Starting from a reduced context-free grammar, we introduce a typed refinement that records both yield types and outer context types, show that the relevant structure is concentrated in a finite typed reconstruction basis, and prove that this basis is exposed by a finite observation set. Occurrences of the same nonterminal symbol may therefore have to be separated when their outer $h$-contexts differ. We then prove exact reconstruction from positive data. From any finite sample $K\subseteqΣ^*$, we construct a canonical hypothesis grammar $\hat G(K)$, and we show that once $K$ contains the finite observation set associated with the target typed grammar, $\hat G(K)$ generates the target language exactly. Consequently, for every explicit finite monoid homomorphism $h$, the class $\mathcal C_h^{\mathrm{cf}}$ of context-free $\sim_h$-substitutable languages is identifiable in the limit from positive data, with polynomial-time hypothesis construction and update. For the linear subclass $\mathcal C_h^{\mathrm{lin}}$, we further prove polynomial upper bounds on characteristic-sample size and word length. Thus the same learner gives a full polynomial time-and-data result for the linear subclass.

56.1FLMay 12

Finite Sentence-Interface Control for Learning Bounded-Fan-Out Linear MCFGs under Fixed Monoid Typing

Takayuki Kuriyama

We study positive-data learning of bounded-fan-out linear multiple context-free grammars under a fixed explicit finite monoid homomorphism $h$. The main obstacle beyond the context-free case is that an MCFG nonterminal derives a tuple whose components may be placed in a surrounding sentence in different orders. We introduce sentence-interface types as finite external control objects for such tuple occurrences. A type records the permutation of tuple components in the final sentence together with the $h$-values of the boundary intervals between them. For reduced working binary linear nondeleting MCFG presentations whose string languages satisfy $(f,h)$-tuple substitutability, we build a typed refinement, a finite characteristic sample, and a canonical positive-data learner. Once the sample contains this characteristic sample and remains contained in the target language, the learner reconstructs the language exactly. Consequently, for fixed fan-out bound $f$ and fixed explicit $h$, the resulting class is identifiable in the limit from positive data. Moreover, the hypothesis associated with any given finite sample is constructible in polynomial time for fixed $f$ and fixed $h$, including output size. Thus sentence-interface control is the finite mechanism that lifts fixed-$h$ distributional reconstruction from context-free grammars to bounded-fan-out linear MCFGs.

Takayuki Kuriyama

2 Papers