Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication
This work improves parsing efficiency for computational linguistics and natural language processing, though it is incremental as it builds on existing matrix multiplication techniques.
The paper tackles the problem of parsing linear context-free rewriting systems (LCFRS) by developing a matrix multiplication recognition algorithm with a running time of O(n^{ωd}), where ω is the matrix multiplication exponent and d is the contact rank, and extends it to general binary LCFRS with O(n^{ωd + 1}) time, achieving results like O(n^{4.76}) for mildly context-sensitive formalisms.
We describe a matrix multiplication recognition algorithm for a subset of binary linear context-free rewriting systems (LCFRS) with running time $O(n^{ωd})$ where $M(m) = O(m^ω)$ is the running time for $m \times m$ matrix multiplication and $d$ is the "contact rank" of the LCFRS -- the maximal number of combination and non-combination points that appear in the grammar rules. We also show that this algorithm can be used as a subroutine to get a recognition algorithm for general binary LCFRS with running time $O(n^{ωd + 1})$. The currently best known $ω$ is smaller than $2.38$. Our result provides another proof for the best known result for parsing mildly context sensitive formalisms such as combinatory categorial grammars, head grammars, linear indexed grammars, and tree adjoining grammars, which can be parsed in time $O(n^{4.76})$. It also shows that inversion transduction grammars can be parsed in time $O(n^{5.76})$. In addition, binary LCFRS subsumes many other formalisms and types of grammars, for some of which we also improve the asymptotic complexity of parsing.