ReSyn: A Generalized Recursive Regular Expression Synthesis Framework
This work addresses the challenge of generating accurate regular expressions for real-world applications, such as data extraction and validation, by improving synthesis performance on complex structures, though it is incremental as it builds on existing PBE systems.
The authors tackled the problem of synthesizing complex regular expressions from examples, where existing methods struggle with high structural complexity like deep nesting and unions, and proposed ReSyn, a synthesizer-agnostic framework that decomposes problems into sub-problems, achieving significant accuracy boosts and establishing a new state-of-the-art on real-world benchmarks.
Existing Programming-By-Example (PBE) systems often rely on simplified benchmarks that fail to capture the high structural complexity-such as deeper nesting and frequent Unions-of real-world regexes. To overcome the resulting performance drop, we propose ReSyn, a synthesizer-agnostic divide-and-conquer framework that decomposes complex synthesis problems into manageable sub-problems. We also introduce Set2Regex, a parameter-efficient synthesizer capturing the permutation invariance of examples. Experimental results demonstrate that ReSyn significantly boosts accuracy across various synthesizers, and its combination with Set2Regex establishes a new state-of-the-art on challenging real-world benchmark.