UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems
This provides an improved evaluation framework for researchers in conversational recommender systems, though it is incremental as an upgrade to an existing toolkit.
The authors tackled the scarcity of simulation-based evaluation resources for conversational recommender systems by upgrading the UserSimCRS toolkit to align with state-of-the-art research, adding features like LLM-based simulators and broader integration capabilities.
Resources for simulation-based evaluation of conversational recommender systems (CRSs) are scarce. The UserSimCRS toolkit was introduced to address this gap. In this work, we present UserSimCRS v2, a significant upgrade aligning the toolkit with state-of-the-art research. Key extensions include an enhanced agenda-based user simulator, introduction of large language model-based simulators, integration for a wider range of CRSs and datasets, and new LLM-as-a-judge evaluation utilities. We demonstrate these extensions in a case study.