CLJan 26

MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts

arXiv:2601.18790v1h-index: 8
Originality Incremental advance
AI Analysis

This highlights a critical safety problem for AI deployment in real-world scenarios, showing an incremental but important conflict between reasoning objectives and emergency responsiveness.

The paper investigates whether large language models optimized for reasoning ignore safety in emergencies, finding that specialized reasoning models often ignore life-threatening situations while maintaining high task completion rates, with delays up to 15 seconds.

Large Language Models are increasingly optimized for deep reasoning, prioritizing the correct execution of complex tasks over general conversation. We investigate whether this focus on calculation creates a "tunnel vision" that ignores safety in critical situations. We introduce MortalMATH, a benchmark of 150 scenarios where users request algebra help while describing increasingly life-threatening emergencies (e.g., stroke symptoms, freefall). We find a sharp behavioral split: generalist models (like Llama-3.1) successfully refuse the math to address the danger. In contrast, specialized reasoning models (like Qwen-3-32b and GPT-5-nano) often ignore the emergency entirely, maintaining over 95 percent task completion rates while the user describes dying. Furthermore, the computational time required for reasoning introduces dangerous delays: up to 15 seconds before any potential help is offered. These results suggest that training models to relentlessly pursue correct answers may inadvertently unlearn the survival instincts required for safe deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes