MA CLMar 11

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Yuanhong Wu, Djallel Bouneffouf, D. Frank Hsu

arXiv:2603.11126v116.4h-index: 25

Predicted impact top 48% in MA · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the problem of ensuring trustworthy and safe deployment of LLMs for users by improving value alignment, though it is incremental as it builds on existing multi-agent and fusion techniques.

The paper tackles the challenge of aligning large language models with human values by proposing a multi-agent system with combinatorial fusion, which outperforms single-agent baselines and prior aggregation methods on standard metrics.

Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values. Empirical evaluation demonstrates that VAS-CFA outperforms both single agent baselines and prior aggregation approaches on standard metrics, showing that multi-agent fusion provides a robust and effective mechanism for advancing value alignment in LLMs.

View on arXiv PDF

Similar