IGLOO: Slicing the Features Space to Represent Sequences
This addresses a bottleneck in sequence modeling for applications like bioinformatics and NLP, though it appears incremental compared to existing methods like Transformers and LSTMs.
The authors tackled the problem of processing long sequences in neural networks by introducing IGLOO, a new architecture that uses relationships between non-local patches from convolutional feature maps to represent sequences, achieving the ability to handle dependencies over 20,000 steps efficiently and competitive results on tasks like permuted MNIST (98.4%) and Wikitext-2.
Historically, Recurrent neural networks (RNNs) and its variants such as LSTM and GRU and more recently Transformers have been the standard go-to components when processing sequential data with neural networks. One notable issue is the relative difficulty to deal with long sequences (i.e. more than 20,000 steps). We introduce IGLOO, a new neural network architecture which aims at being efficient for short sequences but also at being able to deal with long sequences. IGLOOs core idea is to use the relationships between non-local patches sliced out of the features maps of successively applied convolutions to build a representation for the sequence. We show that the model can deal with dependencies of more than 20,000 steps in a reasonable time frame. We stress test IGLOO on the copy-memory and addition tasks, as well as permuted MNIST (98.4%). For a larger task we apply this new structure to the Wikitext-2 dataset Merity et al. (2017b) and achieve a perplexity in line with baseline Transformers but lower than baseline AWD-LSTM. We also present how IGLOO is already used today in production for bioinformatics tasks.