Learning Pairwise Disjoint Simple Languages from Positive Examples
This addresses a novel grammatical inference problem for clustering positive examples in domains like industrial printing, but it appears incremental as it builds on classical DFA identification.
The paper tackles the problem of identifying multiple deterministic finite automata (DFAs) from positive examples belonging to different unknown simple regular languages, proposing two compression-based methods for clustering and applying them to print job data from industrial printers.
A classical problem in grammatical inference is to identify a deterministic finite automaton (DFA) from a set of positive and negative examples. In this paper, we address the related - yet seemingly novel - problem of identifying a set of DFAs from examples that belong to different unknown simple regular languages. We propose two methods based on compression for clustering the observed positive examples. We apply our methods to a set of print jobs submitted to large industrial printers.