LGDec 10, 2025

Mixture of Lookup Key-Value Experts

arXiv:2512.09723v14.1

Originality Incremental advance

AI Analysis

This is an incremental improvement for resource-constrained LLM inference on end-user devices.

The paper tackles the context-independent expert selection limitation of Mixture of Lookup Experts (MoLE) by proposing MoLKV, which uses key-value pairs for context-aware expert activation, achieving significantly lower validation loss in small-scale evaluations.

Recent research has developed several LLM architectures suitable for inference on end-user devices, such as the Mixture of Lookup Experts (MoLE)~\parencite{jie_mixture_2025}. A key feature of MoLE is that each token id is associated with a dedicated group of experts. For a given input, only the experts corresponding to the input token id will be activated. Since the communication overhead of loading this small number of activated experts into RAM during inference is negligible, expert parameters can be offloaded to storage, making MoLE suitable for resource-constrained devices. However, MoLE's context-independent expert selection mechanism, based solely on input ids, may limit model performance. To address this, we propose the \textbf{M}ixture \textbf{o}f \textbf{L}ookup \textbf{K}ey-\textbf{V}alue Experts (\textbf{MoLKV}) model. In MoLKV, each expert is structured as a key-value pair. For a given input, the input-derived query interacts with the cached key-value experts from the current sequence, generating a context-aware expert output. This context-aware mechanism alleviates the limitation of MoLE, and experimental results demonstrate that MoLKV achieves significantly lower validation loss in small-scale evaluations.

View on arXiv PDF

Similar