Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
This addresses a gap in benchmarking for e-commerce agents, though it is incremental as it extends existing benchmarking frameworks with new data and evaluation dimensions.
The authors tackled the problem that current e-commerce agent benchmarks lack evaluation of mixed-type dialogues and complex domain rules by introducing Mix-ECom, a novel corpus of 4,799 samples covering multiple dialogue types, task types, and 82 e-commerce rules, and showed that current agents struggle with hallucination due to these complexities.
E-commerce agents contribute greatly to helping users complete their e-commerce needs. To promote further research and application of e-commerce agents, benchmarking frameworks are introduced for evaluating LLM agents in the e-commerce domain. Despite the progress, current benchmarks lack evaluating agents' capability to handle mixed-type e-commerce dialogue and complex domain rules. To address the issue, this work first introduces a novel corpus, termed Mix-ECom, which is constructed based on real-world customer-service dialogues with post-processing to remove user privacy and add CoT process. Specifically, Mix-ECom contains 4,799 samples with multiply dialogue types in each e-commerce dialogue, covering four dialogue types (QA, recommendation, task-oriented dialogue, and chit-chat), three e-commerce task types (pre-sales, logistics, after-sales), and 82 e-commerce rules. Furthermore, this work build baselines on Mix-Ecom and propose a dynamic framework to further improve the performance. Results show that current e-commerce agents lack sufficient capabilities to handle e-commerce dialogues, due to the hallucination cased by complex domain rules. The dataset will be publicly available.