Set-Theoretic Compositionality of Sentence Embeddings
This work addresses a gap in NLP by providing a systematic evaluation method for sentence encoders' fundamental compositional properties, which is incremental as it builds on existing set theory concepts.
The paper tackled the problem of evaluating sentence embeddings' compositional properties in a task-independent context by proposing six criteria based on set theory operations, and found that SBERT consistently outperformed other encoders, including LLMs, in aligning with these criteria.
Sentence encoders play a pivotal role in various NLP tasks; hence, an accurate evaluation of their compositional properties is paramount. However, existing evaluation methods predominantly focus on goal task-specific performance. This leaves a significant gap in understanding how well sentence embeddings demonstrate fundamental compositional properties in a task-independent context. Leveraging classical set theory, we address this gap by proposing six criteria based on three core "set-like" compositions/operations: \textit{TextOverlap}, \textit{TextDifference}, and \textit{TextUnion}. We systematically evaluate $7$ classical and $9$ Large Language Model (LLM)-based sentence encoders to assess their alignment with these criteria. Our findings show that SBERT consistently demonstrates set-like compositional properties, surpassing even the latest LLMs. Additionally, we introduce a new dataset of ~$192$K samples designed to facilitate future benchmarking efforts on set-like compositionality of sentence embeddings.