SEAIMay 12

Uncertainty Quantification for LLM-based Code Generation

arXiv:2605.1220181.9
AI Analysis

This work addresses the need for reliable uncertainty quantification in LLM-based code generation, a critical problem for developers relying on such models.

RisCoSet uses multiple hypothesis testing to construct risk-controlling prediction sets for LLM-based code generation, guaranteeing a correct solution with high confidence. It reduces code removal by up to 24.5% compared to state-of-the-art at the same risk level.

Prediction sets provide a theoretically grounded framework for quantifying uncertainty in machine learning models. Adapting them to structured generation tasks, in particular, large language model (LLM) based code generation, remains a challenging problem. An existing attempt proposes PAC prediction sets but is limited by its strong monotonicity assumption on risk and single-label classification framework, which severely limits the space of candidate programs and cannot accommodate the multiple valid outputs inherent to code generation. To address these limitations, we propose an approach RisCoSet that leverages multiple hypothesis testing to construct risk-controlling predictions for LLM-based code generation. Given a trained code generation model, we produce a prediction set represented by a partial program, which is guaranteed to contain a correct solution with high confidence. Extensive experiments on three LLMs demonstrate the effectiveness of the proposed method. For instance, compared with the state-of-the-art, our method can significantly reduce the code removal by up to 24.5%, at the same level of risk.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes