Automata Learning: An Algebraic Approach
This work provides a foundational framework for automata learning that addresses a broad range of language types, including those previously not covered, which is significant for researchers in formal methods and theoretical computer science.
The authors tackled the problem of learning unknown formal languages of various types by proposing a generic categorical framework that reduces the task to learning abstract algebraic automata, and they developed a learning algorithm generalizing Angluin's L* that applies to structures like omega-regular languages and yields new algorithms for sorted languages, nominal languages, and cost functions.
We propose a generic categorical framework for learning unknown formal languages of various types (e.g. finite or infinite words, weighted and nominal languages). Our approach is parametric in a monad T that represents the given type of languages and their recognizing algebraic structures. Using the concept of anautomata presentation of T-algebras, we demonstrate that the task of learning a T-recognizable language can be reduced to learning an abstract form of algebraic automaton whose transitions are modeled by a functor. For the important case of adjoint automata, we devise a learning algorithm generalizing Angluin's L*. The algorithm is phrased in terms of categorically described extension steps; we provide for a termination and complexity analysis based on a dedicated notion of finiteness. Our framework applies to structures like omega-regular languages that were not within the scope of existing categorical accounts of automata learning. In addition, it yields new learning algorithms for several types of languages for which no such algorithms were previously known at all, including sorted languages, nominal languages with name binding, and cost functions.