MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning

Tianmeng Hu, Biao Luo, Chunhua Yang, Tingwen Huang

arXiv:2603.00730v159 citationsIEEE Trans Pattern Anal Mach Intell

Originality Incremental advance

AI Analysis

This addresses a gap in cooperative decision-making for scenarios with conflicting objectives and multiple agents, though it is incremental as it builds on existing CTDE frameworks.

The paper tackles the multi-objective multi-agent reinforcement learning problem by proposing MO-MIX, which effectively generates an approximation of the Pareto set and significantly outperforms baselines in all four evaluation metrics while requiring less computational cost.

Deep reinforcement learning (RL) has been applied extensively to solve complex decision-making problems. In many real-world scenarios, tasks often have several conflicting objectives and may require multiple agents to cooperate, which are the multi-objective multi-agent decision-making problems. However, only few works have been conducted on this intersection. Existing approaches are limited to separate fields and can only handle multi-agent decision-making with a single objective, or multi-objective decision-making with a single agent. In this paper, we propose MO-MIX to solve the multi-objective multi-agent reinforcement learning (MOMARL) problem. Our approach is based on the centralized training with decentralized execution (CTDE) framework. A weight vector representing preference over the objectives is fed into the decentralized agent network as a condition for local action-value function estimation, while a mixing network with parallel architecture is used to estimate the joint action-value function. In addition, an exploration guide approach is applied to improve the uniformity of the final non-dominated solutions. Experiments demonstrate that the proposed method can effectively solve the multi-objective multi-agent cooperative decision-making problem and generate an approximation of the Pareto set. Our approach not only significantly outperforms the baseline method in all four kinds of evaluation metrics, but also requires less computational cost.

View on arXiv PDF

Similar