AICRSep 28, 2025

Quant Fever, Reasoning Blackholes, Schrodinger's Compliance, and More: Probing GPT-OSS-20B

arXiv:2509.23882v2h-index: 3
Originality Synthesis-oriented
AI Analysis

This identifies vulnerabilities in a widely used open-weight language model, which is incremental as it applies existing evaluation tools to a new model.

The study conducted a security evaluation of OpenAI's GPT-OSS-20B model, uncovering failure modes like quant fever and reasoning blackholes that can be exploited under adversarial conditions.

OpenAI's GPT-OSS family provides open-weight language models with explicit chain-of-thought (CoT) reasoning and a Harmony prompt format. We summarize an extensive security evaluation of GPT-OSS-20B that probes the model's behavior under different adversarial conditions. Using the Jailbreak Oracle (JO) [1], a systematic LLM evaluation tool, the study uncovers several failure modes including quant fever, reasoning blackholes, Schrodinger's compliance, reasoning procedure mirage, and chain-oriented prompting. Experiments demonstrate how these behaviors can be exploited on the GPT-OSS-20B model, leading to severe consequences.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes