CVAILGJun 8, 2022

Sparse Mixture-of-Experts are Domain Generalizable Learners

arXiv:2206.04046v6110 citationsh-index: 23
Originality Highly original
AI Analysis

This addresses the problem of out-of-distribution generalization in machine learning for vision tasks, offering a complementary approach to existing methods.

The paper tackles domain generalization by proposing a novel backbone architecture, Generalizable Mixture-of-Experts (GMoE), which outperforms state-of-the-art methods on DomainBed by a large margin when trained with empirical risk minimization.

Human visual perception can easily generalize to out-of-distributed visual data, which is far beyond the capability of modern machine learning models. Domain generalization (DG) aims to close this gap, with existing DG methods mainly focusing on the loss function design. In this paper, we propose to explore an orthogonal direction, i.e., the design of the backbone architecture. It is motivated by an empirical finding that transformer-based models trained with empirical risk minimization (ERM) outperform CNN-based models employing state-of-the-art (SOTA) DG algorithms on multiple DG datasets. We develop a formal framework to characterize a network's robustness to distribution shifts by studying its architecture's alignment with the correlations in the dataset. This analysis guides us to propose a novel DG model built upon vision transformers, namely Generalizable Mixture-of-Experts (GMoE). Extensive experiments on DomainBed demonstrate that GMoE trained with ERM outperforms SOTA DG baselines by a large margin. Moreover, GMoE is complementary to existing DG methods and its performance is substantially improved when trained with DG algorithms.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes