LG MLMay 28

On the Construction and Implications of Low-Loss Valleys in LoRA-based Bayesian Inference

Daniel Dold, Emanuel Sommer, Julius Kobialka, Oliver Dürr, David Rügamer

arXiv:2605.2958067.2

AI Analysis

This work addresses the challenge of epistemic uncertainty estimation in parameter-efficient fine-tuning of large language models, providing a method to achieve functional diversity beyond discrete ensembles.

The paper introduces LoRA-Curve, a segmented Bézier curve parameterization for LoRA-based fine-tuning that connects independent optima through continuous low-loss valleys, and shows that this approach yields higher mutual information in predictive distributions without sacrificing performance on reasoning and classification benchmarks with Qwen2.5 7B.

While parameter-efficient fine-tuning methods like low-rank adaptation (LoRA) are standard for large language models, principled estimation of epistemic uncertainty remains challenging. Recent results in the LoRA regime suggest that discrete multi-mode approaches such as deep ensembles offer little benefit over single-mode methods. This contradicts broader observations in deep learning, where ensembling independent optima typically improves generalization, and linking these modes through continuous low-loss valleys further enhances Bayesian model averaging (BMA). Whether such structure exists in the LoRA space and whether it yields functional diversity missed by local or discrete methods has not been studied. We introduce LoRA-Curve, a segmented Bézier curve parameterization in the LoRA space, with two variants: a free configuration that jointly optimizes all control points, and an anchored configuration that connects independently fine-tuned LoRA optima. We prove pathwise continuity and Lipschitz regularity of the loss along the curve and empirically show, across reasoning and classification benchmarks with Qwen2.5 7B, that linear interpolation encounters loss barriers, while our anchored multi-segment curves connect independent optima through continuous low-loss valleys. Combined with flat-minima perturbations and a Jensen-Shannon divergence regularizer, LoRA-Curve yields measurably higher mutual information of the predictive distribution without sacrificing performance, and links continuous parameter-space traversal to functional diversity.

View on arXiv PDF

Similar