LGAICLFeb 7, 2025

Can Large Language Models Understand Intermediate Representations in Compilers?

arXiv:2502.06854v22 citationsh-index: 6ICML
Originality Incremental advance
AI Analysis

This research addresses the problem of improving the understanding of Intermediate Representations in compilers for the field of programming language processing and compiler design, which is crucial for developers and researchers working on compiler optimization and program analysis.

This study evaluated the ability of six state-of-the-art Large Language Models to understand Intermediate Representations in compilers, finding that while they can parse syntax and identify high-level structures, they struggle with instruction-level reasoning. The models consistently failed to accurately handle control flow, loops, and dynamic execution, with specific failure modes including misinterpreting branching instructions and omitting critical operations.

Intermediate Representations (IRs) play a critical role in compiler design and program analysis, yet their comprehension by Large Language Models (LLMs) remains underexplored. In this paper, we present an explorative empirical study evaluating the capabilities of six state-of-the-art LLMs: GPT-4, GPT-3, DeepSeek, Gemma 2, Llama 3, and Code Llama, in understanding IRs. Specifically, we assess model performance across four core tasks: control flow graph reconstruction, decompilation, code summarization, and execution reasoning. While LLMs exhibit competence in parsing IR syntax and identifying high-level structures, they consistently struggle with instruction-level reasoning, especially in control flow reasoning, loop handling, and dynamic execution. Common failure modes include misinterpreting branching instructions, omitting critical operations, and relying on heuristic reasoning rather than precise instruction-level logic. Our findings highlight the need for IR-specific enhancements in LLM design. We recommend fine-tuning on structured IR datasets and integrating control-flow-sensitive architectures to improve model effectiveness. All experimental data and source code are publicly available at

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes