SEAIDec 12, 2024

Kajal: Extracting Grammar of a Source Code Using Large Language Models

arXiv:2412.08842v11 citationsh-index: 4Has Code
Originality Incremental advance
AI Analysis

This addresses the time-intensive and error-prone task of DSL grammar extraction for software engineers, offering an incremental improvement through automation.

The paper tackles the problem of manually creating grammars for domain-specific languages (DSLs) by introducing Kajal, an approach that automatically infers grammar from DSL code snippets using Large Language Models (LLMs) with prompt engineering and few-shot learning, achieving 60% accuracy with few-shot learning and 45% without it.

Understanding and extracting the grammar of a domain-specific language (DSL) is crucial for various software engineering tasks; however, manually creating these grammars is time-intensive and error-prone. This paper presents Kajal, a novel approach that automatically infers grammar from DSL code snippets by leveraging Large Language Models (LLMs) through prompt engineering and few-shot learning. Kajal dynamically constructs input prompts, using contextual information to guide the LLM in generating the corresponding grammars, which are iteratively refined through a feedback-driven approach. Our experiments show that Kajal achieves 60% accuracy with few-shot learning and 45% without it, demonstrating the significant impact of few-shot learning on the tool's effectiveness. This approach offers a promising solution for automating DSL grammar extraction, and future work will explore using smaller, open-source LLMs and testing on larger datasets to further validate Kajal's performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes