Assessing Composition in Sentence Vector Representations
This work addresses the challenge of evaluating sentence vector representations for compositionality, which is important for researchers in natural language processing, but it is incremental as it focuses on assessment rather than new model development.
The authors tackled the problem of assessing how well neural sentence composition models capture compositional meaning by developing a method to probe sentence vector representations with controlled tasks. They found that their method could extract useful information about the differing capacities of existing models, though no specific numerical results were provided.
An important component of achieving language understanding is mastering the composition of sentence meaning, but an immediate challenge to solving this problem is the opacity of sentence vector representations produced by current neural sentence composition models. We present a method to address this challenge, developing tasks that directly target compositional meaning information in sentence vector representations with a high degree of precision and control. To enable the creation of these controlled tasks, we introduce a specialized sentence generation system that produces large, annotated sentence sets meeting specified syntactic, semantic and lexical constraints. We describe the details of the method and generation system, and then present results of experiments applying our method to probe for compositional information in embeddings from a number of existing sentence composition models. We find that the method is able to extract useful information about the differing capacities of these models, and we discuss the implications of our results with respect to these systems' capturing of sentence information. We make available for public use the datasets used for these experiments, as well as the generation system.