AI CRSep 28, 2025

Quant Fever, Reasoning Blackholes, Schrodinger's Compliance, and More: Probing GPT-OSS-20B

Shuyi Lin, Tian Lu, Zikai Wang, Bo Wen, Yibo Zhao, Cheng Tan

arXiv:2509.23882v23.3h-index: 3

Originality Synthesis-oriented

AI Analysis

This identifies vulnerabilities in a widely used open-weight language model, which is incremental as it applies existing evaluation tools to a new model.

The study conducted a security evaluation of OpenAI's GPT-OSS-20B model, uncovering failure modes like quant fever and reasoning blackholes that can be exploited under adversarial conditions.

OpenAI's GPT-OSS family provides open-weight language models with explicit chain-of-thought (CoT) reasoning and a Harmony prompt format. We summarize an extensive security evaluation of GPT-OSS-20B that probes the model's behavior under different adversarial conditions. Using the Jailbreak Oracle (JO) [1], a systematic LLM evaluation tool, the study uncovers several failure modes including quant fever, reasoning blackholes, Schrodinger's compliance, reasoning procedure mirage, and chain-oriented prompting. Experiments demonstrate how these behaviors can be exploited on the GPT-OSS-20B model, leading to severe consequences.

View on arXiv PDF

Similar