VISPA: Pluralistic Alignment via Automatic Value Selection and Activation
This addresses the problem of ensuring language models reflect diverse human values in high-stakes applications, offering a scalable solution for pluralistic alignment.
The paper tackles the challenge of aligning large language models with a range of human perspectives rather than average preferences, introducing VISPA, a training-free framework that enables direct control over value expression through dynamic selection and internal activation steering, showing it performs well across pluralistic alignment modes in healthcare and other domains.
As large language models are increasingly used in high-stakes domains, it is essential that their outputs reflect not average} human preference, rather range of varying perspectives. Achieving such pluralism, however, remains challenging. Existing approaches consider limited values or rely on prompt-level interventions, lacking value control and representation. To address this, we introduce VISPA, a training-free pluralistic alignment framework, that enables direct control over value expression by dynamic selection and internal model activation steering. Across extensive empirical studies spanning multiple models and evaluation settings, we show VISPA is performant across all pluralistic alignment modes in healthcare and beyond. Further analysis reveals VISPA is adaptable with different steering initiations, model, and/or values. These results suggest that pluralistic alignment can be achieved through internal activation mechanisms, offering a scalable path toward language models that serves all.