CLAISDASOct 22, 2024

VoiceBench: Benchmarking LLM-Based Voice Assistants

arXiv:2410.17196v3174 citationsh-index: 45Has Code
Originality Synthesis-oriented
AI Analysis

This addresses a gap in benchmarking for voice assistants, enabling better evaluation in real-world scenarios, though it is incremental as it builds on existing LLM advancements.

The authors tackled the lack of benchmarks for evaluating LLM-based voice assistants by introducing VoiceBench, which includes real and synthetic spoken instructions with real-world variations, revealing limitations of current models.

Building on the success of large language models (LLMs), recent advancements such as GPT-4o have enabled real-time speech interactions through LLM-based voice assistants, offering a significantly improved user experience compared to traditional text-based interactions. However, the absence of benchmarks designed to evaluate these speech interaction capabilities has hindered progress of LLM-based voice assistants development. Current evaluations focus primarily on automatic speech recognition (ASR) or general knowledge evaluation with clean speeches, neglecting the more intricate, real-world scenarios that involve diverse speaker characteristics, environmental and content factors. To address this, we introduce VoiceBench, the first benchmark designed to provide a multi-faceted evaluation of LLM-based voice assistants. VoiceBench also includes both real and synthetic spoken instructions that incorporate the above three key real-world variations. Extensive experiments reveal the limitations of current LLM-based voice assistant models and offer valuable insights for future research and development in this field.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes