Evaluation of Unsupervised Compositional Representations
This work provides an incremental analysis of compositional representations for NLP researchers, identifying performance differences across models and datasets.
The paper evaluated various compositional models on supervised and unsupervised benchmarks, finding that weighted vector averaging outperforms context-sensitive models in most cases, but RNN models are useful for certain classification tasks.
We evaluated various compositional models, from bag-of-words representations to compositional RNN-based models, on several extrinsic supervised and unsupervised evaluation benchmarks. Our results confirm that weighted vector averaging can outperform context-sensitive models in most benchmarks, but structural features encoded in RNN models can also be useful in certain classification tasks. We analyzed some of the evaluation datasets to identify the aspects of meaning they measure and the characteristics of the various models that explain their performance variance.