AIApr 1

Self-Routing: Parameter-Free Expert Routing from Hidden States

arXiv:2604.0042151.41 citationsh-index: 32
AI Analysis

This addresses the problem of reducing parameter overhead in MoE models for machine learning practitioners, though it is incremental as it builds on existing MoE frameworks.

The paper tackled the necessity of a learned router in Mixture-of-Experts layers by proposing Self-Routing, a parameter-free method that uses hidden states directly for routing, and found it remains competitive with learned routers while improving expert utilization, achieving about 17% higher routing entropy and slight gains on ImageNet-1K.

Mixture-of-Experts (MoE) layers increase model capacity by activating only a small subset of experts per token, and typically rely on a learned router to map hidden states to expert assignments. In this work, we ask whether a dedicated learned router is strictly necessary in the MoE settings we study. We propose Self-Routing, a parameter-free routing mechanism that uses a designated subspace of the token hidden state directly as expert logits, eliminating the router projection entirely while leaving the rest of the MoE layer unchanged. We evaluate Self-Routing on GPT-2-scale language modeling and ImageNet-1K classification by comparing it against a standard learned router, random-routing baselines, and dense non-MoE baselines. Our results show that Self-Routing remains competitive with the learned-router baseline while removing all dedicated routing parameters, and yields more balanced expert utilization, with about 17 % higher average normalized routing entropy and no explicit load-balancing loss. On ImageNet-1K with DeiT-S/16, Self-Routing also slightly improves over the corresponding learned-router MoE. These findings suggest that effective MoE routing can emerge from the hidden representation itself without requiring a separate learned router module.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes