AICYLGJul 22, 2025

Can You Trust an LLM with Your Life-Changing Decision? An Investigation into AI High-Stakes Responses

arXiv:2507.21132v12 citationsh-index: 2
Originality Incremental advance
AI Analysis

It addresses safety concerns for users relying on LLMs for critical decisions, though it is incremental in proposing new benchmarks and steering methods.

This paper investigated the risks of LLMs providing confident but misguided responses in high-stakes life advice, finding that top-performing models achieve high safety scores by frequently asking clarifying questions rather than issuing prescriptive advice, and that cautiousness can be controlled via activation steering.

Large Language Models (LLMs) are increasingly consulted for high-stakes life advice, yet they lack standard safeguards against providing confident but misguided responses. This creates risks of sycophancy and over-confidence. This paper investigates these failure modes through three experiments: (1) a multiple-choice evaluation to measure model stability against user pressure; (2) a free-response analysis using a novel safety typology and an LLM Judge; and (3) a mechanistic interpretability experiment to steer model behavior by manipulating a "high-stakes" activation vector. Our results show that while some models exhibit sycophancy, others like o4-mini remain robust. Top-performing models achieve high safety scores by frequently asking clarifying questions, a key feature of a safe, inquisitive approach, rather than issuing prescriptive advice. Furthermore, we demonstrate that a model's cautiousness can be directly controlled via activation steering, suggesting a new path for safety alignment. These findings underscore the need for nuanced, multi-faceted benchmarks to ensure LLMs can be trusted with life-changing decisions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes