LGMay 4, 2025

Semantic Probabilistic Control of Language Models

arXiv:2505.01954v14 citationsh-index: 4
Originality Highly original
AI Analysis

This addresses the challenge of precise semantic control in language models for applications requiring safe or tailored outputs, representing a novel method for a known bottleneck.

The paper tackles the problem of steering language model generations to satisfy subtle non-lexical constraints like toxicity or sentiment, which is computationally intractable due to non-decomposable verifiers, and achieves high-probability constraint satisfaction (>95%) without degrading generation quality.

Semantic control entails steering LM generations towards satisfying subtle non-lexical constraints, e.g., toxicity, sentiment, or politeness, attributes that can be captured by a sequence-level verifier. It can thus be viewed as sampling from the LM distribution conditioned on the target attribute, a computationally intractable problem due to the non-decomposable nature of the verifier. Existing approaches to LM control either only deal with syntactic constraints which cannot capture the aforementioned attributes, or rely on sampling to explore the conditional LM distribution, an ineffective estimator for low-probability events. In this work, we leverage a verifier's gradient information to efficiently reason over all generations that satisfy the target attribute, enabling precise steering of LM generations by reweighing the next-token distribution. Starting from an initial sample, we create a local LM distribution favoring semantically similar sentences. This approximation enables the tractable computation of an expected sentence embedding. We use this expected embedding, informed by the verifier's evaluation at the initial sample, to estimate the probability of satisfying the constraint, which directly informs the update to the next-token distribution. We evaluated the effectiveness of our approach in controlling the toxicity, sentiment, and topic-adherence of LMs yielding generations satisfying the constraint with high probability (>95%) without degrading their quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes