LG CV MLMay 2, 2023

Sequence Modeling with Multiresolution Convolutional Memory

Jiaxin Shi, Ke Alexander Wang, Emily B. Fox

arXiv:2305.01638v218.024 citationsHas Code

Originality Highly original

AI Analysis

This work addresses a fundamental problem in sequence modeling for tasks like classification and generative modeling, offering a novel method that balances computational efficiency and performance.

The paper tackles the challenge of efficiently capturing long-range patterns in sequential data by introducing a new building block for sequence modeling called MultiresLayer, which uses multiresolution convolution inspired by wavelet analysis. The model achieves state-of-the-art performance on sequence classification and autoregressive density estimation tasks with datasets like CIFAR-10, ListOps, and PTB-XL, while requiring significantly fewer parameters and maintaining an O(N log N) memory footprint.

Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural networks, or the parameter burden of convolutional networks with many or large filters. We instead take inspiration from wavelet-based multiresolution analysis to define a new building block for sequence modeling, which we call a MultiresLayer. The key component of our model is the multiresolution convolution, capturing multiscale trends in the input sequence. Our MultiresConv can be implemented with shared filters across a dilated causal convolution tree. Thus it garners the computational advantages of convolutional networks and the principled theoretical motivation of wavelet decompositions. Our MultiresLayer is straightforward to implement, requires significantly fewer parameters, and maintains at most a $\mathcal{O}(N\log N)$ memory footprint for a length $N$ sequence. Yet, by stacking such layers, our model yields state-of-the-art performance on a number of sequence classification and autoregressive density estimation tasks using CIFAR-10, ListOps, and PTB-XL datasets.

View on arXiv PDF Code

Similar