CLSep 27, 2025

Cognition-of-Thought Elicits Social-Aligned Reasoning in Large Language Models

arXiv:2509.23441v23 citationsh-index: 7
Originality Highly original
AI Analysis

This addresses the problem of making AI alignment more transparent and adaptable for developers and users, though it builds incrementally on existing decoding-time intervention approaches.

The paper tackles the problem of harmful behaviors in large language models by introducing Cognition-of-Thought (CooT), a decoding-time framework that adds an explicit cognitive self-monitoring loop, resulting in consistent improvements in safety and social reasoning performance across multiple benchmarks and model families.

Large language models (LLMs) excel at complex reasoning but can still exhibit harmful behaviors. Current alignment strategies typically embed safety into model weights, making these controls implicit, static, and difficult to modify. This paper introduces Cognition-of-Thought (CooT), a novel decoding-time framework that equips LLMs with an explicit cognitive self-monitoring loop. CooT couples a standard text Generator with a cognitive Perceiver that continuously monitors the unfolding sequence. The Perceiver uses a structured, precedence-based hierarchy of principles (e.g., safety over obedience) to detect potential misalignments as they arise. When violations are flagged, CooT intervenes by rolling back the generation to the point of error and regenerating under injected guidance that combines universal social priors with context-specific warnings. CooT thus transforms alignment from a fixed property into an explicit, dynamic, and auditable process active during inference, allowing for flexible policy updates without retraining the model. Extensive experiments across multiple benchmarks and model families confirm that CooT consistently improves safety and social reasoning performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes