CLSDASNov 13, 2024

A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models

arXiv:2411.08742v111 citationsh-index: 12
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of optimizing speech token representations for researchers and practitioners in speech processing and AI, but it is incremental as it focuses on comparison and analysis rather than introducing a new method.

The paper tackled the performance gap between discrete and continuous speech tokens in Speech Large Language Models by conducting a comparative study across semantic-related tasks, finding that continuous features generally outperform discrete tokens, especially in fine-grained semantic understanding tasks.

With the rise of Speech Large Language Models (Speech LLMs), there has been growing interest in discrete speech tokens for their ability to integrate with text-based tokens seamlessly. Compared to most studies that focus on continuous speech features, although discrete-token based LLMs have shown promising results on certain tasks, the performance gap between these two paradigms is rarely explored. In this paper, we present a fair and thorough comparison between discrete and continuous features across a variety of semantic-related tasks using a light-weight LLM (Qwen1.5-0.5B). Our findings reveal that continuous features generally outperform discrete tokens, particularly in tasks requiring fine-grained semantic understanding. Moreover, this study goes beyond surface-level comparison by identifying key factors behind the under-performance of discrete tokens, such as limited token granularity and inefficient information retention. To enhance the performance of discrete tokens, we explore potential aspects based on our analysis. We hope our results can offer new insights into the opportunities for advancing discrete speech tokens in Speech LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes