AISep 8, 2025

Evaluating Multi-Turn Bargain Skills in LLM-Based Seller Agent

Issue Yishu Wang, Kakam Chong, Xiaofeng Wang, Xu Yan, DeXin Kong, Chen Ju, Ming Chen, Shuai Xiao, Shuguang Han, jufeng chen

arXiv:2509.06341v13.3h-index: 5

Originality Synthesis-oriented

AI Analysis

This work addresses the need for better evaluation of seller agents in online second-hand marketplaces, though it is incremental as it focuses on benchmarking rather than proposing a new agent method.

The paper tackles the problem of evaluating multi-turn bargaining skills in LLM-based seller agents by introducing a framework that tests their ability to extract and track buyer intents across long negotiations, resulting in a large-scale benchmark with 3,014 tasks and 9,892 products.

In online second-hand marketplaces, multi-turn bargaining is a crucial part of seller-buyer interactions. Large Language Models (LLMs) can act as seller agents, negotiating with buyers on behalf of sellers under given business constraints. A critical ability for such agents is to track and accurately interpret cumulative buyer intents across long negotiations, which directly impacts bargaining effectiveness. We introduce a multi-turn evaluation framework for measuring the bargaining ability of seller agents in e-commerce dialogues. The framework tests whether an agent can extract and track buyer intents. Our contributions are: (1) a large-scale e-commerce bargaining benchmark spanning 622 categories, 9,892 products, and 3,014 tasks; (2) a turn-level evaluation framework grounded in Theory of Mind (ToM) with annotated buyer intents, moving beyond outcome-only metrics; and (3) an automated pipeline that extracts reliable intent from massive dialogue data.

View on arXiv PDF

Similar