AI SEDec 16, 2025

IaC Generation with LLMs: An Error Taxonomy and A Study on Configuration Knowledge Injection

Roman Nekrasov, Stefano Fossati, Indika Kumara, Damian Andrew Tamburri, Willem-Jan van den Heuvel

arXiv:2512.14792v1h-index: 13

Originality Incremental advance

AI Analysis

This addresses the problem of unreliable IaC generation for DevOps engineers, though it is incremental in improving existing methods.

This research tackled the problem of low success rates in LLM-generated Infrastructure as Code (IaC) by systematically injecting structured configuration knowledge, which increased technical validation success from 27.1% to 75.3% and overall success to 62.6%. However, intent alignment plateaued, revealing a 'Correctness-Congruence Gap' where LLMs improved as coders but remained limited as architects.

Large Language Models (LLMs) currently exhibit low success rates in generating correct and intent-aligned Infrastructure as Code (IaC). This research investigated methods to improve LLM-based IaC generation, specifically for Terraform, by systematically injecting structured configuration knowledge. To facilitate this, an existing IaC-Eval benchmark was significantly enhanced with cloud emulation and automated error analysis. Additionally, a novel error taxonomy for LLM-assisted IaC code generation was developed. A series of knowledge injection techniques was implemented and evaluated, progressing from Naive Retrieval-Augmented Generation (RAG) to more sophisticated Graph RAG approaches. These included semantic enrichment of graph components and modeling inter-resource dependencies. Experimental results demonstrated that while baseline LLM performance was poor (27.1% overall success), injecting structured configuration knowledge increased technical validation success to 75.3% and overall success to 62.6%. Despite these gains in technical correctness, intent alignment plateaued, revealing a "Correctness-Congruence Gap" where LLMs can become proficient "coders" but remain limited "architects" in fulfilling nuanced user intent.

View on arXiv PDF

Similar