LGCLSTMLDec 1, 2024

Quantifying perturbation impacts for large language models

arXiv:2412.00868v18 citationsh-index: 74
Originality Incremental advance
AI Analysis

This addresses model reliability and interpretability for LLM users, but is incremental as it builds on existing perturbation analysis methods.

The paper tackles the problem of quantifying how input perturbations affect large language model outputs by introducing Distribution-Based Perturbation Analysis (DBPA), a framework that reformulates this as a frequentist hypothesis testing problem to disentangle meaningful changes from stochasticity, resulting in interpretable p-values and effect sizes.

We consider the problem of quantifying how an input perturbation impacts the outputs of large language models (LLMs), a fundamental task for model reliability and post-hoc interpretability. A key obstacle in this domain is disentangling the meaningful changes in model responses from the intrinsic stochasticity of LLM outputs. To overcome this, we introduce Distribution-Based Perturbation Analysis (DBPA), a framework that reformulates LLM perturbation analysis as a frequentist hypothesis testing problem. DBPA constructs empirical null and alternative output distributions within a low-dimensional semantic similarity space via Monte Carlo sampling. Comparisons of Monte Carlo estimates in the reduced dimensionality space enables tractable frequentist inference without relying on restrictive distributional assumptions. The framework is model-agnostic, supports the evaluation of arbitrary input perturbations on any black-box LLM, yields interpretable p-values, supports multiple perturbation testing via controlled error rates, and provides scalar effect sizes for any chosen similarity or distance metric. We demonstrate the effectiveness of DBPA in evaluating perturbation impacts, showing its versatility for perturbation analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes