SDAICLMay 9

WASIL: In-the-Wild Arabic Spoken Interactions with LLMs

arXiv:2605.1636485.1
Predicted impact top 9% in SD · last 90 daysOriginality Incremental advance
AI Analysis

For researchers building Arabic voice assistants, this dataset and evaluation methodology address the lack of in-the-wild spoken interaction data with explicit feedback and answerability annotations.

The paper introduces WASIL, a dataset of 8,529 Arabic spoken interaction turns with ASR hypotheses, assistant responses, and like/dislike feedback, plus a 2,000-turn test set covering MSA and four dialects. It provides gold transcripts via multi-ASR agreement-guided post-editing and annotates answerability to isolate ASR effects, enabling reference-free evaluation of LLM responses.

Large Language Models (LLMs) voice assistants are commonly built as cascaded Automatic Speech recognition (ASR) to LLM systems, where recognition errors can distort user intent. Dislikes may also arise from ambiguous, out-of-domain, or non-request turns, making it hard to isolate ASR effects. We release WASIL (it denotes connection or linking in Arabic): in-the-wild Arabic spoken interaction prompts with audio, ASR hypotheses, assistant responses, and explicit like/dislike feedback (8,529 turns; 14.2% dislikes), plus a 2,000-turn test set covering Modern Standard Arabic (MSA) and four major dialects with their labels. We provide low-cost gold transcripts via multi-ASR agreement-guided post-editing and annotate answerability (answerable, ambiguous/needs-clarification, unsupported, not-a-request/noise) to separate intrinsic unanswerability from ASR-induced degradation. Finally, we describe scalable reference-free evaluation of responses from ASR vs. gold transcripts using multi-judge LLM scoring.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes