CLFeb 7, 2025

CodeSCM: Causal Analysis for Multi-Modal Code Generation

arXiv:2502.05150v112 citationsh-index: 3NAACL
Originality Highly original
AI Analysis

This research addresses the problem of understanding the causal effects of different prompt modalities on code generation for developers and researchers working with large language models, providing an incremental step towards more explainable code generation.

The authors tackled the problem of analyzing multi-modal code generation using large language models and found that input-output examples significantly influence code generation, in addition to natural language instructions. Their model, CodeSCM, quantifies direct effects representing the model's spurious leanings.

In this paper, we propose CodeSCM, a Structural Causal Model (SCM) for analyzing multi-modal code generation using large language models (LLMs). By applying interventions to CodeSCM, we measure the causal effects of different prompt modalities, such as natural language, code, and input-output examples, on the model. CodeSCM introduces latent mediator variables to separate the code and natural language semantics of a multi-modal code generation prompt. Using the principles of Causal Mediation Analysis on these mediators we quantify direct effects representing the model's spurious leanings. We find that, in addition to natural language instructions, input-output examples significantly influence code generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes