Correlation-based Intrinsic Evaluation of Word Vector Representations
This addresses the need for better evaluation metrics in natural language processing, particularly for researchers and practitioners working with word embeddings, though it is incremental as it builds on prior intrinsic evaluation approaches.
The paper tackled the problem of evaluating word vector representations by introducing QVEC-CCA, an intrinsic metric based on correlations with linguistic features, and showed it effectively proxies for extrinsic tasks with higher and more consistent correlations than existing word similarity methods.
We introduce QVEC-CCA--an intrinsic evaluation metric for word vector representations based on correlations of learned vectors with features extracted from linguistic resources. We show that QVEC-CCA scores are an effective proxy for a range of extrinsic semantic and syntactic tasks. We also show that the proposed evaluation obtains higher and more consistent correlations with downstream tasks, compared to existing approaches to intrinsic evaluation of word vectors that are based on word similarity.