CLFeb 15, 2024

Answer is All You Need: Instruction-following Text Embedding via Answering the Question

arXiv:2402.09642v139 citationsh-index: 13ACL
Originality Highly original
AI Analysis

It addresses the need for user-oriented embeddings in AI, offering a novel solution for customizable text representation.

The paper tackles the problem of creating text embeddings that follow user instructions by treating instructions as questions and encoding expected answers, resulting in significantly improved instruction-following capabilities for both large and smaller language models.

This work aims to build a text embedder that can capture characteristics of texts specified by user instructions. Despite its tremendous potential to deploy user-oriented embeddings, none of previous approaches provides a concrete solution for it. This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the representation accordingly. Intuitively, texts with the same (implicit) semantics would share similar answers following the instruction, thus leading to more similar embeddings. Specifically, we propose InBedder that instantiates this embed-via-answering idea by only fine-tuning language models on abstractive question answering tasks. InBedder demonstrates significantly improved instruction-following capabilities according to our proposed instruction awareness tests and instruction robustness tests, when applied to both large language models (LLMs) (e.g., llama-2-7b) and smaller encoder-based LMs (e.g., roberta-large). Additionally, our qualitative analysis of clustering outcomes, achieved by applying different instructions to the same corpus, demonstrates a high degree of interpretability.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes