SE AI PLMar 31, 2025

Assessing Code Understanding in LLMs

Cosimo Laneve, Alvise Spanò, Dalila Ressi, Sabina Rossi, Michele Bugliesi

arXiv:2504.00065v18.02 citationsh-index: 29Has CodeFORTE

Originality Synthesis-oriented

AI Analysis

This addresses code understanding reliability for developers using LLMs, but it is incremental as it builds on existing evaluation methods.

The paper evaluated Large Language Models' ability to understand code by testing their judgment of semantic equivalence after program transformations, finding failure rates of 41% without context and 29% with generic context. It proposed integrating LLMs with code-optimization tools to improve accuracy.

We present an empirical evaluation of Large Language Models in code understanding associated with non-trivial, semantic-preserving program transformations such as copy propagation or constant folding. Our findings show that LLMs fail to judge semantic equivalence in approximately 41\% of cases when no context is provided and in 29\% when given a simple generic context. To improve accuracy, we advocate integrating LLMs with code-optimization tools to enhance training and facilitate more robust program understanding.

View on arXiv PDF Code

Similar