ASAIJan 21

Fast-ULCNet: A fast and ultra low complexity network for single-channel speech enhancement

arXiv:2601.14925v1h-index: 1
Originality Incremental advance
AI Analysis

This work addresses the need for efficient speech enhancement in embedded devices, but it is incremental as it builds on an existing state-of-the-art model.

The paper tackled the problem of reducing computational latency and complexity in single-channel speech enhancement for resource-constrained devices by adapting ULCNet with FastGRNNs and a complementary filter to mitigate state drifting, resulting in a model that performs on par with state-of-the-art while reducing model size by more than half and latency by 34% on average.

Single-channel speech enhancement algorithms are often used in resource-constrained embedded devices, where low latency and low complexity designs gain more importance. In recent years, researchers have proposed a wide variety of novel solutions to this problem. In particular, a recent deep learning model named ULCNet is among the state-of-the-art approaches in this domain. This paper proposes an adaptation of ULCNet, by replacing its GRU layers with FastGRNNs, to reduce both computational latency and complexity. Furthermore, this paper shows empirical evidence on the performance decay of FastGRNNs in long audio signals during inference due to internal state drifting, and proposes a novel approach based on a trainable complementary filter to mitigate it. The resulting model, Fast-ULCNet, performs on par with the state-of-the-art original ULCNet architecture on a speech enhancement task, while reducing its model size by more than half and decreasing its latency by 34% on average.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes