CLSDASNov 4, 2022

Once-for-All Sequence Compression for Self-Supervised Speech Models

arXiv:2211.02332v46 citationsh-index: 52
Originality Incremental advance
AI Analysis

This work addresses computational efficiency for speech processing tasks, but it is incremental as it builds on existing sequence compression methods by adding flexibility.

The paper tackles the problem of high computational cost in self-supervised speech models due to long sequence lengths by introducing a once-for-all framework that supports a continuous range of compression rates, resulting in marginal degradation compared to fixed-rate variants and enabling task-specific rate selection without grid search.

The sequence length along the time axis is often the dominant factor of the computation in speech processing. Works have been proposed to reduce the sequence length for lowering the computational cost in self-supervised speech models. However, different downstream tasks have different tolerance of sequence compressing, so a model that produces a fixed compressing rate may not fit all tasks. In this work, we introduce a once-for-all (OFA) sequence compression framework for self-supervised speech models that supports a continuous range of operating compressing rates. The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants with a smooth performance-efficiency trade-off. We further explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid search.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes