CLApr 8, 2025

Multi-Sense Embeddings for Language Models and Knowledge Distillation

arXiv:2504.06036v23 citationsh-index: 5Has CodeACL
Originality Incremental advance
AI Analysis

This work addresses the challenge of capturing word senses in language models for NLP applications, offering a method to improve efficiency through distillation, but it is incremental as it builds on existing embedding and clustering techniques.

The paper tackles the problem of representing multiple senses of words in language models by proposing multi-sense embeddings as a drop-in replacement for tokens, and introduces a knowledge distillation method using these embeddings to train smaller models that maintain competitive performance while saving space and inference time.

Transformer-based large language models (LLMs) rely on contextual embeddings which generate different (continuous) representations for the same token depending on its surrounding context. Nonetheless, words and tokens typically have a limited number of senses (or meanings). We propose multi-sense embeddings as a drop-in replacement for each token in order to capture the range of their uses in a language. To construct a sense embedding dictionary, we apply a clustering algorithm to embeddings generated by an LLM and consider the cluster centers as representative sense embeddings. In addition, we propose a novel knowledge distillation method that leverages the sense dictionary to learn a smaller student model that mimics the senses from the much larger base LLM model, offering significant space and inference time savings, while maintaining competitive performance. Via thorough experiments on various benchmarks, we showcase the effectiveness of our sense embeddings and knowledge distillation approach. We share our code at https://github.com/Qitong-Wang/SenseDict

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes