CLSep 21, 2024

Routing in Sparsely-gated Language Models responds to Context

arXiv:2409.14107v124 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work provides incremental insights into routing mechanisms for researchers in efficient large-scale language modeling.

The study investigated how context influences token-expert assignments in sparsely-gated language models, finding that encoder layers rely more on semantic associations with contextual refinement, while decoder layers are less context-sensitive.

Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes