IRAIMay 15, 2023

SWAN: A Generic Framework for Auditing Textual Conversational Systems

arXiv:2305.08290v111 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for auditing conversational systems to prevent negative impacts on users and society, though it appears incremental as it builds on existing measures like S-measure and U-measure.

The authors introduced SWAN, a generic framework for auditing textual conversational systems by computing a score based on extracted nuggets from conversation samples, weighted by user models. They also proposed a schema of criteria for potential incorporation into the framework.

We present a simple and generic framework for auditing a given textual conversational system, given some samples of its conversation sessions as its input. The framework computes a SWAN (Schematised Weighted Average Nugget) score based on nugget sequences extracted from the conversation sessions. Following the approaches of S-measure and U-measure, SWAN utilises nugget positions within the conversations to weight the nuggets based on a user model. We also present a schema of twenty (+1) criteria that may be worth incorporating in the SWAN framework. In our future work, we plan to devise conversation sampling methods that are suitable for the various criteria, construct seed user turns for comparing multiple systems, and validate specific instances of SWAN for the purpose of preventing negative impacts of conversational systems on users and society. This paper was written while preparing for the ICTIR 2023 keynote (to be given on July 23, 2023).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes