MA AI OCAug 20, 2024

Sequential Resource Trading Using Comparison-Based Gradient Estimation

Surya Murthy, Mustafa O. Karabag, Ufuk Topcu

arXiv:2408.11186v31.21 citationsh-index: 53Has Code

Originality Incremental advance

AI Analysis

This work addresses resource allocation in multi-agent systems, but it is incremental as it builds on existing gradient estimation and trading frameworks.

The paper tackles the problem of sequential resource trading between autonomous agents with unknown preferences by proposing an algorithm that estimates the other agent's gradient from acceptance/rejection responses to reach Pareto-optimal allocations. The result shows the algorithm improves societal benefit with fewer offers compared to baselines, validated in simulations and a user study with human participants.

Autonomous agents interact with other autonomous agents and humans of unknown preferences to share resources in their environment. We explore sequential trading for resource allocation in a setting where two greedily rational agents sequentially trade resources from a finite set of categories. Each agent has a utility function that depends on the amount of resources it possesses in each category. The offering agent makes trade offers to improve its utility without knowing the responding agent's utility function, and the responding agent only accepts offers that improve its utility. To facilitate cooperation between an autonomous agent and another autonomous agent or a human, we present an algorithm for the offering agent to estimate the responding agent's gradient (preferences) and make offers based on previous acceptance or rejection responses. The algorithm's goal is to reach a Pareto-optimal resource allocation state while ensuring that the utilities of both agents improve after every accepted trade. The algorithm estimates the responding agent's gradient by leveraging the rejected offers and the greedy rationality assumption, to prune the space of potential gradients. We show that, after the algorithm makes a finite number of rejected offers, the algorithm either finds a mutually beneficial trade or certifies that the current state is epsilon-weakly Pareto optimal. We compare the proposed algorithm against various baselines in continuous and discrete trading scenarios and show that it improves the societal benefit with fewer offers. Additionally, we validate these findings in a user study with human participants, where the algorithm achieves high performance in scenarios with high resource conflict due to aligned agent goals.

View on arXiv PDF Code

Similar