CLLGOct 24, 2022

Subspace Representations for Soft Set Operations and Sentence Similarities

arXiv:2210.13034v431 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in NLP for researchers and practitioners by improving the expressiveness of set representations, though it is incremental as it builds on existing word embeddings.

The paper tackled the problem of representing sets of words in NLP by proposing subspace-based representations that enable soft set operations like union and intersection, and showed that this approach outperforms conventional vector-based methods in sentence similarity and set retrieval tasks on standard benchmarks.

In the field of natural language processing (NLP), continuous vector representations are crucial for capturing the semantic meanings of individual words. Yet, when it comes to the representations of sets of words, the conventional vector-based approaches often struggle with expressiveness and lack the essential set operations such as union, intersection, and complement. Inspired by quantum logic, we realize the representation of word sets and corresponding set operations within pre-trained word embedding spaces. By grounding our approach in the linear subspaces, we enable efficient computation of various set operations and facilitate the soft computation of membership functions within continuous spaces. Moreover, we allow for the computation of the F-score directly within word vectors, thereby establishing a direct link to the assessment of sentence similarity. In experiments with widely-used pre-trained embeddings and benchmarks, we show that our subspace-based set operations consistently outperform vector-based ones in both sentence similarity and set retrieval tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes