LGAICLApr 16

FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models

arXiv:2604.1548887.7h-index: 4Has Code
Predicted impact top 10% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners needing to adjust LLM behavior without retraining, FineSteer offers a more effective and utility-preserving steering method than existing approaches.

FineSteer introduces a two-stage inference-time steering framework for LLMs that combines conditional steering to preserve utility and mixture-of-experts to generate query-specific steering vectors, achieving stronger steering performance with minimal utility loss on safety and truthfulness benchmarks.

Large language models (LLMs) often exhibit undesirable behaviors, such as safety violations and hallucinations. Although inference-time steering offers a cost-effective way to adjust model behavior without updating its parameters, existing methods often fail to be simultaneously effective, utility-preserving, and training-efficient due to their rigid, one-size-fits-all designs and limited adaptability. In this work, we present FineSteer, a novel steering framework that decomposes inference-time steering into two complementary stages: conditional steering and fine-grained vector synthesis, allowing fine-grained control over when and how to steer internal representations. In the first stage, we introduce a Subspace-guided Conditional Steering (SCS) mechanism that preserves model utility by avoiding unnecessary steering. In the second stage, we propose a Mixture-of-Steering-Experts (MoSE) mechanism that captures the multimodal nature of desired steering behaviors and generates query-specific steering vectors for improved effectiveness. Through tailored designs in both SCS and MoSE, FineSteer maintains robust performance on general queries while adaptively optimizing steering vectors for targeted inputs in a training-efficient manner. Extensive experiments on safety and truthfulness benchmarks show that FineSteer outperforms state-of-the-art methods in overall performance, achieving stronger steering performance with minimal utility loss. Code is available at https://github.com/YukinoAsuna/FineSteer

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes