Magic, Madness, Heaven, Sin: LLM Output Diversity is Everything, Everywhere, All at Once

arXiv:2604.0150470.4h-index: 1

AI Analysis

This work addresses the problem of inconsistent terminology and evaluation in LLM research for researchers and practitioners, offering a structured approach to context-aware analysis.

The paper tackles the fragmented study of output variation in LLMs by introducing a framework that categorizes tasks into four normative contexts and analyzes how optimizing for one objective can harm others, such as safety improvements reducing demographic representation.

Research on Large Language Models (LLMs) studies output variation across generation, reasoning, alignment, and representational analysis, often under the umbrella of "diversity." Yet the terminology remains fragmented, largely because the normative objectives underlying tasks are rarely made explicit. We introduce the Magic, Madness, Heaven, Sin framework, which models output variation along a homogeneity-heterogeneity axis, where valuation is determined by the task and its normative objective. We organize tasks into four normative contexts: epistemic (factuality), interactional (user utility), societal (representation), and safety (robustness). For each, we examine the failure modes and vocabulary such as hallucination, mode collapse, bias, and erasure through which variation is studied. We apply the framework to analyze all pairwise cross-contextual interactions, revealing that optimizing for one objective, such as improving safety, can inadvertently harm demographic representation or creative diversity. We argue for context-aware evaluation of output variation, reframing it as a property shaped by task objectives rather than a model's intrinsic trait.

View on arXiv PDF

Similar