LGAIMar 12, 2024

Conditional computation in neural networks: principles and research trends

arXiv:2403.07965v212 citationsh-index: 31Intell Artif
AI Analysis

It offers a foundational overview for researchers interested in improving neural network efficiency and modularity, but is incremental as it synthesizes existing trends.

The paper provides a tutorial introduction to conditional computation in neural networks, summarizing principles and implementations like mixture-of-experts and early-exit networks, and analyzes benefits in efficiency, explainability, and transfer learning.

This article summarizes principles and ideas from the emerging area of applying \textit{conditional computation} methods to the design of neural networks. In particular, we focus on neural networks that can dynamically activate or de-activate parts of their computational graph conditionally on their input. Examples include the dynamic selection of, e.g., input tokens, layers (or sets of layers), and sub-modules inside each layer (e.g., channels in a convolutional filter). We first provide a general formalism to describe these techniques in an uniform way. Then, we introduce three notable implementations of these principles: mixture-of-experts (MoEs) networks, token selection mechanisms, and early-exit neural networks. The paper aims to provide a tutorial-like introduction to this growing field. To this end, we analyze the benefits of these modular designs in terms of efficiency, explainability, and transfer learning, with a focus on emerging applicative areas ranging from automated scientific discovery to semantic communication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes