Multiple Context-Free Tree Grammars: Lexicalization and Characterization
This work addresses theoretical foundations in formal language theory for computational linguistics, providing incremental advancements in grammar transformations and characterizations.
The paper tackles the problem of lexicalizing multiple context-free tree grammars, showing that every finitely ambiguous grammar can be transformed into an equivalent lexicalized version with bounded increases in rank and multiplicity, and it characterizes their generative power as equivalent to multi-component tree adjoining grammars and deterministic finite-copying macro tree transducers.
Multiple (simple) context-free tree grammars are investigated, where "simple" means "linear and nondeleting". Every multiple context-free tree grammar that is finitely ambiguous can be lexicalized; i.e., it can be transformed into an equivalent one (generating the same tree language) in which each rule of the grammar contains a lexical symbol. Due to this transformation, the rank of the nonterminals increases at most by 1, and the multiplicity (or fan-out) of the grammar increases at most by the maximal rank of the lexical symbols; in particular, the multiplicity does not increase when all lexical symbols have rank 0. Multiple context-free tree grammars have the same tree generating power as multi-component tree adjoining grammars (provided the latter can use a root-marker). Moreover, every multi-component tree adjoining grammar that is finitely ambiguous can be lexicalized. Multiple context-free tree grammars have the same string generating power as multiple context-free (string) grammars and polynomial time parsing algorithms. A tree language can be generated by a multiple context-free tree grammar if and only if it is the image of a regular tree language under a deterministic finite-copying macro tree transducer. Multiple context-free tree grammars can be used as a synchronous translation device.