CLFeb 15, 2024

Answer is All You Need: Instruction-following Text Embedding via Answering the Question

Letian Peng, Yuwei Zhang, Zilong Wang, Jayanth Srinivasa, Gaowen Liu, Zihan Wang, Jingbo Shang

arXiv:2402.09642v119.239 citationsh-index: 13Has CodeACL

Originality Highly original

AI Analysis

It addresses the need for user-oriented embeddings in AI, offering a novel solution for customizable text representation.

The paper tackles the problem of creating text embeddings that follow user instructions by treating instructions as questions and encoding expected answers, resulting in significantly improved instruction-following capabilities for both large and smaller language models.

This work aims to build a text embedder that can capture characteristics of texts specified by user instructions. Despite its tremendous potential to deploy user-oriented embeddings, none of previous approaches provides a concrete solution for it. This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the representation accordingly. Intuitively, texts with the same (implicit) semantics would share similar answers following the instruction, thus leading to more similar embeddings. Specifically, we propose InBedder that instantiates this embed-via-answering idea by only fine-tuning language models on abstractive question answering tasks. InBedder demonstrates significantly improved instruction-following capabilities according to our proposed instruction awareness tests and instruction robustness tests, when applied to both large language models (LLMs) (e.g., llama-2-7b) and smaller encoder-based LMs (e.g., roberta-large). Additionally, our qualitative analysis of clustering outcomes, achieved by applying different instructions to the same corpus, demonstrates a high degree of interpretability.

View on arXiv PDF Code

Similar