Self-Attention for Audio Super-Resolution
This work addresses audio quality enhancement for applications like media processing, but it is incremental as it adapts existing attention mechanisms to a specific domain.
The authors tackled audio super-resolution by proposing a network that combines convolution and self-attention, outperforming existing approaches on benchmarks and enabling faster training through increased parallelization.
Convolutions operate only locally, thus failing to model global interactions. Self-attention is, however, able to learn representations that capture long-range dependencies in sequences. We propose a network architecture for audio super-resolution that combines convolution and self-attention. Attention-based Feature-Wise Linear Modulation (AFiLM) uses self-attention mechanism instead of recurrent neural networks to modulate the activations of the convolutional model. Extensive experiments show that our model outperforms existing approaches on standard benchmarks. Moreover, it allows for more parallelization resulting in significantly faster training.