TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants
This provides a benchmark for researchers to test conversational and interactive knowledge assistants, addressing the need for standardized evaluation in this domain, though it is incremental as it builds on existing TREC tracks.
The paper introduces the TREC iKAT 2023 test collection, which includes 36 personalized dialogues over 20 topics with personal knowledge bases and 344 turns with 26,000 passages, to evaluate conversational search agents by assessing relevance, completeness, groundedness, and naturalness.
Conversational information seeking has evolved rapidly in the last few years with the development of Large Language Models (LLMs), providing the basis for interpreting and responding in a naturalistic manner to user requests. The extended TREC Interactive Knowledge Assistance Track (iKAT) collection aims to enable researchers to test and evaluate their Conversational Search Agents (CSA). The collection contains a set of 36 personalized dialogues over 20 different topics each coupled with a Personal Text Knowledge Base (PTKB) that defines the bespoke user personas. A total of 344 turns with approximately 26,000 passages are provided as assessments on relevance, as well as additional assessments on generated responses over four key dimensions: relevance, completeness, groundedness, and naturalness. The collection challenges CSA to efficiently navigate diverse personal contexts, elicit pertinent persona information, and employ context for relevant conversations. The integration of a PTKB and the emphasis on decisional search tasks contribute to the uniqueness of this test collection, making it an essential benchmark for advancing research in conversational and interactive knowledge assistants.