SEAILGPLFeb 8, 2024

Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates

arXiv:2402.05980v317 citationsh-index: 29ICML
Originality Incremental advance
AI Analysis

This addresses the need for interpretability in AI-driven code generation, providing a method to evaluate model understanding, but it is incremental as it builds on existing evaluation techniques.

The authors tackled the problem of understanding whether large code models comprehend programming concepts by proposing a counterfactual testing framework, finding that current models lack understanding of concepts like data flow and control flow.

Large Language Models' success on text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We propose Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts. With only black-box access to the model, we use CACP to evaluate ten popular Large Code Models for four different programming concepts. Our findings suggest that current models lack understanding of concepts such as data flow and control flow.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes