Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs: A Systematic Evaluation
This addresses the challenge of maintaining textual DSL instances during grammar evolution for software engineers, though it is incremental as it builds on existing LLM capabilities for a specific domain.
The study tackled the problem of co-evolving grammars and instances of textual DSLs by systematically evaluating large language models (LLMs), finding strong performance on small-scale cases with ≥94% precision and recall for instances under 20 lines, but performance degraded with scale, such as Claude maintaining 85% recall at 40 lines while GPT failed on larger instances.
Software languages evolve over time for reasons such as feature additions. When grammars evolve, textual instances that originally conformed to them may become outdated. While model-driven engineering provides many techniques for co-evolving models with metamodel changes, these approaches are not designed for textual DSLs and may lose human-relevant information such as layout and comments. This study systematically evaluates the potential of large language models (LLMs) for co-evolving grammars and instances of textual DSLs. Using Claude Sonnet 4.5 and GPT-5.2 across ten case languages with ten runs each, we assess both correctness and preservation of human-oriented information. Results show strong performance on small-scale cases ($\geq$94% precision and recall for instances requiring fewer than 20 modified lines), but performance degraded with scale: Claude maintains 85% recall at 40 lines, while GPT fails on the largest instances. Response time increases substantially with instance size, and grammar evolution complexity and deletion granularity affect performance more than change type. These findings clarify when LLM-based co-evolution is effective and where current limitations remain.