CLLGOct 28, 2022

You can't pick your neighbors, or can you? When and how to rely on retrieval in the $k$NN-LM

arXiv:2210.15859v123 citationsh-index: 111
Originality Incremental advance
AI Analysis

This work addresses the optimization of retrieval mechanisms in language models for NLP researchers, offering incremental improvements in performance.

The paper investigates the role of lexical and semantic matching in retrieval-enhanced language models, specifically the kNN-LM, and proposes a new formulation that adjusts interpolation based on retrieval quality, achieving nearly 4% perplexity improvement on Wikitext-103.

Retrieval-enhanced language models (LMs), which condition their predictions on text retrieved from large external datastores, have recently shown significant perplexity improvements compared to standard LMs. One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model and requires no additional training. In this paper, we explore the importance of lexical and semantic matching in the context of items retrieved by $k$NN-LM. We find two trends: (1) the presence of large overlapping $n$-grams between the datastore and evaluation set plays an important factor in strong performance, even when the datastore is derived from the training data; and (2) the $k$NN-LM is most beneficial when retrieved items have high semantic similarity with the query. Based on our analysis, we define a new formulation of the $k$NN-LM that uses retrieval quality to assign the interpolation coefficient. We empirically measure the effectiveness of our approach on two English language modeling datasets, Wikitext-103 and PG-19. Our re-formulation of the $k$NN-LM is beneficial in both cases, and leads to nearly 4% improvement in perplexity on the Wikitext-103 test set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes