Using Holographically Compressed Embeddings in Question Answering
This addresses the need for more precise embeddings in NLP tasks like question answering, though it is incremental as it builds on existing methods.
The research tackled the problem of ambiguous word embeddings in question answering by using holographic compression to incorporate part-of-speech and named entity information without increasing input size, resulting in preserved semantic relationships and strong performance.
Word vector representations are central to deep learning natural language processing models. Many forms of these vectors, known as embeddings, exist, including word2vec and GloVe. Embeddings are trained on large corpora and learn the word's usage in context, capturing the semantic relationship between words. However, the semantics from such training are at the level of distinct words (known as word types), and can be ambiguous when, for example, a word type can be either a noun or a verb. In question answering, parts-of-speech and named entity types are important, but encoding these attributes in neural models expands the size of the input. This research employs holographic compression of pre-trained embeddings, to represent a token, its part-of-speech, and named entity type, in the same dimension as representing only the token. The implementation, in a modified question answering recurrent deep learning network, shows that semantic relationships are preserved, and yields strong performance.