LG CCOct 6, 2025

On the Hardness of Learning Regular Expressions

Idan Attias, Lev Reyzin, Nathan Srebro, Gal Vardi

arXiv:2510.04834v19.41 citationsh-index: 17

Originality Incremental advance

AI Analysis

This addresses a fundamental gap in understanding the learnability of regular expressions, which are widely used in practice, but the results are incremental as they extend known hardness results to this specific formalism.

The paper tackles the computational complexity of learning regular expressions, showing that PAC learning is hard even under uniform distribution and establishing hardness for distribution-free learning with membership queries, with additional hardness results when expressions are extended with complement or intersection.

Despite the theoretical significance and wide practical use of regular expressions, the computational complexity of learning them has been largely unexplored. We study the computational hardness of improperly learning regular expressions in the PAC model and with membership queries. We show that PAC learning is hard even under the uniform distribution on the hypercube, and also prove hardness of distribution-free learning with membership queries. Furthermore, if regular expressions are extended with complement or intersection, we establish hardness of learning with membership queries even under the uniform distribution. We emphasize that these results do not follow from existing hardness results for learning DFAs or NFAs, since the descriptive complexity of regular languages can differ exponentially between DFAs, NFAs, and regular expressions.

View on arXiv PDF

Similar