AI CL LGOct 14, 2025

Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing

Rongzhi Zhang, Liqin Ye, Yuzhao Heng, Xiang Chen, Tong Yu, Lingkai Kong, Sudheer Chava, Chao Zhang

Georgia Tech

arXiv:2510.12121v17.82 citationsh-index: 29Has Code

Originality Incremental advance

AI Analysis

This addresses the need for AI systems adaptable to diverse user expectations by providing exact attribute intensities, though it appears incremental as it builds on existing alignment methods with targeted improvements.

The paper tackles the problem of achieving precise attribute intensity control in Large Language Models (LLMs), which current alignment methods fail to do reliably, and demonstrates that their method enables fine-grained, continuous control with high accuracy in experiments on models like LLaMA-3.2-3b and Phi-4-mini.

Precise attribute intensity control--generating Large Language Model (LLM) outputs with specific, user-defined attribute intensities--is crucial for AI systems adaptable to diverse user expectations. Current LLM alignment methods, however, typically provide only directional or open-ended guidance, failing to reliably achieve exact attribute intensities. We address this limitation with three key designs: (1) reformulating precise attribute intensity control as a target-reaching problem, rather than simple maximization; (2) training a lightweight value function via temporal-difference learning to predict final attribute intensity scores from partial generations, thereby steering LLM outputs; and (3) employing gradient-based interventions on hidden representations to navigate the model precisely towards specific attribute intensity targets. Our method enables fine-grained, continuous control over attribute intensities, moving beyond simple directional alignment. Experiments on LLaMA-3.2-3b and Phi-4-mini confirm our method's ability to steer text generation to user-specified attribute intensities with high accuracy. Finally, we demonstrate efficiency enhancements across three downstream tasks: preference data synthesis, Pareto frontier approximation and optimization, and distillation of aligned behaviors for intervention-free inference. Our code is available on https://github.com/Pre-Control/pre-control

View on arXiv PDF Code

Similar