CL LG NEOct 2, 2018

Improving Sentence Representations with Consensus Maximisation

arXiv:1810.01064v40.52 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving sentence representations for natural language processing tasks, offering an incremental advance by combining existing self-supervision methods with a novel consensus-based approach.

The paper tackles the problem of learning high-quality sentence representations by proposing a self-supervised framework that minimizes disagreement between two different encoding architectures (RNN and linear model). The result is that individual views and their ensemble achieve better performance on downstream tasks compared to single-view counterparts, with the ensemble trained with consensus maximisation outperforming an analogous ensemble from single-view training.

Consensus maximisation learning can provide self-supervision when different views are available of the same data. The distributional hypothesis provides another form of useful self-supervision from adjacent sentences which are plentiful in large unlabelled corpora. Motivated by the observation that different learning architectures tend to emphasise different aspects of sentence meaning, we present a new self-supervised learning framework for learning sentence representations which minimises the disagreement between two views of the same sentence where one view encodes the sentence with a recurrent neural network (RNN), and the other view encodes the same sentence with a simple linear model. After learning, the individual views (networks) result in higher quality sentence representations than their single-view learnt counterparts (learnt using only the distributional hypothesis) as judged by performance on standard downstream tasks. An ensemble of both views provides even better generalisation on both supervised and unsupervised downstream tasks. Also, importantly the ensemble of views trained with consensus maximisation between the two different architectures performs better on downstream tasks than an analogous ensemble made from the single-view trained counterparts.

View on arXiv PDF

Similar