Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings
This work addresses the challenge of generating effective sentence embeddings for paraphrasing tasks, offering improvements for natural language processing applications.
The paper tackled the problem of learning paraphrastic sentence embeddings, showing that with developments like training on sentence pairs and aggressive regularization, LSTMs outperform word averaging, and introduced a new recurrent architecture, Gated Recurrent Averaging Network, that surpasses both.
We consider the problem of learning general-purpose, paraphrastic sentence embeddings, revisiting the setting of Wieting et al. (2016b). While they found LSTM recurrent networks to underperform word averaging, we present several developments that together produce the opposite conclusion. These include training on sentence pairs rather than phrase pairs, averaging states to represent sequences, and regularizing aggressively. These improve LSTMs in both transfer learning and supervised settings. We also introduce a new recurrent architecture, the Gated Recurrent Averaging Network, that is inspired by averaging and LSTMs while outperforming them both. We analyze our learned models, finding evidence of preferences for particular parts of speech and dependency relations.