LogiCoL: Logically-Informed Contrastive Learning for Set-based Dense Retrieval
This addresses a specific bottleneck in dense retrieval for queries with logical constraints, which is important for downstream applications but often overlooked, representing an incremental improvement.
The paper tackles the problem of dense retrievers struggling with queries containing logical connectives, which leads to retrieved results not respecting logical constraints, and introduces LogiCoL, a logically-informed contrastive learning objective that improves both retrieval performance and logical consistency in entity retrieval tasks.
While significant progress has been made with dual- and bi-encoder dense retrievers, they often struggle on queries with logical connectives, a use case that is often overlooked yet important in downstream applications. Current dense retrievers struggle with such queries, such that the retrieved results do not respect the logical constraints implied in the queries. To address this challenge, we introduce LogiCoL, a logically-informed contrastive learning objective for dense retrievers. LogiCoL builds upon in-batch supervised contrastive learning, and learns dense retrievers to respect the subset and mutually-exclusive set relation between query results via two sets of soft constraints expressed via t-norm in the learning objective. We evaluate the effectiveness of LogiCoL on the task of entity retrieval, where the model is expected to retrieve a set of entities in Wikipedia that satisfy the implicit logical constraints in the query. We show that models trained with LogiCoL yield improvement both in terms of retrieval performance and logical consistency in the results. We provide detailed analysis and insights to uncover why queries with logical connectives are challenging for dense retrievers and why LogiCoL is most effective.