CLAILGMay 9

FragileFlow: Spectral Control of Correct-but-Fragile Predictions for Foundation Model Robustness

arXiv:2605.0889668.7
AI Analysis

For practitioners deploying LLMs and VLMs, this work addresses a structured failure mode hidden by average accuracy metrics, improving robustness to worst-case perturbations.

The paper formalizes 'margin-aware error flow' in LLMs and VLMs, where predictions remain correct but probability mass shifts toward wrong classes near decision boundaries. FragileFlow, a plug-in regularizer, uses a calibrated margin buffer to identify fragile predictions and applies spectral control to improve worst-class robustness, achieving gains in perturbed worst-class accuracy while preserving clean accuracy.

Robust adaptation of LLMs and VLMs is often evaluated by average accuracy or average consistency under perturbations. However, these averages can hide a structured failure mode: a prediction may remain correct while probability mass already flows from particular true classes toward systematic wrong competitors near the decision boundary. In this paper, we formalize this phenomenon as margin-aware error flow and introduce FragileFlow, a plug-in regularizer that uses a calibrated margin buffer to identify correct-but-fragile predictions and organize their off-class probability mass into a class-wise vulnerable-risk matrix. Theoretically, we provide the first PAC-Bayes upper bound for this margin-aware error-flow object, showing how empirical spectral control yields a conservative route to deterministic worst-class robustness under a stability condition. Experiments on multiple-choice LLM benchmarks and few-shot CLIP adaptation show that FragileFlow consistently improves the proposed theory-facing risk measures over matched baselines, yields perturbed worst-class accuracy gains in most settings, and preserves clean accuracy across comparisons.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes