LG LOAug 27, 2017

Learning MSO-definable hypotheses on string

Martin Grohe, Christof Löding, Martin Ritzert

arXiv:1708.08081v11.415 citations

Originality Incremental advance

AI Analysis

This addresses computational efficiency challenges for learning logical hypotheses on large string datasets, though it is incremental as it builds on existing MSO and learning theory frameworks.

The paper tackles the problem of learning classification hypotheses defined by monadic second-order logic (MSO) formulas on string data, aiming for algorithms with runtime polynomial in training set size rather than the full dataset. It shows that sublinear-time learning is impossible with local access to unprocessed strings, but becomes possible with linear-time preprocessing to build an index structure.

We study the classification problems over string data for hypotheses specified by formulas of monadic second-order logic MSO. The goal is to design learning algorithms that run in time polynomial in the size of the training set, independently of or at least sublinear in the size of the whole data set. We prove negative as well as positive results. If the data set is an unprocessed string to which our algorithms have local access, then learning in sublinear time is impossible even for hypotheses definable in a small fragment of first-order logic. If we allow for a linear time pre-processing of the string data to build an index data structure, then learning of MSO-definable hypotheses is possible in time polynomial in the size of the training set, independently of the size of the whole data set.

View on arXiv PDF

Similar