CLAug 3, 2018

Efficient Purely Convolutional Text Encoding

Szymon Malik, Adrian Lancucki, Jan Chorowski

arXiv:1808.01160v10.31 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for efficient text representations in NLP systems, such as conversational agents, but it is incremental as it builds on an existing recursive convolutional architecture.

The paper tackles the problem of creating efficient sentence embeddings by proposing a lightweight convolutional architecture that reduces training time and parameters while improving auto-encoding accuracy, achieving competitive performance on SentEval benchmarks as a low-resource alternative to bag-of-words embeddings.

In this work, we focus on a lightweight convolutional architecture that creates fixed-size vector embeddings of sentences. Such representations are useful for building NLP systems, including conversational agents. Our work derives from a recently proposed recursive convolutional architecture for auto-encoding text paragraphs at byte level. We propose alternations that significantly reduce training time, the number of parameters, and improve auto-encoding accuracy. Finally, we evaluate the representations created by our model on tasks from SentEval benchmark suite, and show that it can serve as a better, yet fairly low-resource alternative to popular bag-of-words embeddings.

View on arXiv PDF Code

Similar