Decoding Decoders: Finding Optimal Representation Spaces for Unsupervised Similarity Tasks
This work addresses the problem of improving unsupervised similarity tasks for researchers and practitioners in NLP and other domains using distributed representations, though it is incremental as it builds on existing concepts.
The paper explains why simple models outperform deep networks on unsupervised similarity tasks by introducing the concept of optimal representation spaces, and presents a procedure that enables deep recurrent models to match or exceed shallow models without retraining, validated through empirical evaluations and new sentence embedding models.
Experimental evidence indicates that simple models outperform complex deep networks on many unsupervised similarity tasks. We provide a simple yet rigorous explanation for this behaviour by introducing the concept of an optimal representation space, in which semantically close symbols are mapped to representations that are close under a similarity measure induced by the model's objective function. In addition, we present a straightforward procedure that, without any retraining or architectural modifications, allows deep recurrent models to perform equally well (and sometimes better) when compared to shallow models. To validate our analysis, we conduct a set of consistent empirical evaluations and introduce several new sentence embedding models in the process. Even though this work is presented within the context of natural language processing, the insights are readily applicable to other domains that rely on distributed representations for transfer tasks.