CLGTFeb 24, 2024

Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method

arXiv:2402.15813v344 citationsh-index: 12ACL
Originality Incremental advance
AI Analysis

This addresses the need for quantitative assessment of negotiation skills in AI agents, though it is incremental as it builds on existing LLM frameworks.

The authors tackled the problem of evaluating bargaining abilities in LLM-driven agents by formalizing bargaining as an asymmetric incomplete information game and creating a benchmark using a real product price dataset, finding that buyer performance is poor and model size does not help. They proposed OG-Narrator, a method that improves buyer deal rates from 26.67% to 88.88% and increases profits tenfold.

Bargaining is an important and unique part of negotiation between humans. As LLM-driven agents learn to negotiate and act like real humans, how to evaluate agents' bargaining abilities remains an open problem. For the first time, we formally described the Bargaining task as an asymmetric incomplete information game, defining the gains of the Buyer and Seller in multiple bargaining processes. It allows us to quantitatively assess an agent's performance in the Bargain task. We collected a real product price dataset, AmazonHistoryPrice, and conducted evaluations of various LLM agents' bargaining abilities. We find that playing a Buyer is much harder than a Seller, and increasing model size can not effectively improve the Buyer's performance. To address the challenge, we propose a novel approach called OG-Narrator that integrates a deterministic Offer Generator to control the price range of Buyer's offers, and an LLM Narrator to create natural language sentences for generated offers. Experimental results show that OG-Narrator improves the buyer's deal rates from 26.67% to 88.88% and brings a ten times multiplication of profits on all baselines, even a model that has not been aligned.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes