LGMLJun 20, 2025

How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension

arXiv:2506.16704v22 citationsh-index: 3
Originality Highly original
AI Analysis

This work addresses a fundamental theoretical question in domain generalization for machine learning researchers, providing a tight characterization of domain sample complexity.

The paper tackles the problem of determining how many domains are needed for domain generalization by introducing the domain shattering dimension, which characterizes domain sample complexity and shows a tight relationship with VC dimension, proving that PAC-learnable hypothesis classes remain learnable in this setting.

We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to learn a model that performs reasonably well on every seen and unseen domain in the family? We model this problem in the PAC framework and introduce a new combinatorial measure, which we call the domain shattering dimension. We show that this dimension characterizes the domain sample complexity. Furthermore, we establish a tight quantitative relationship between the domain shattering dimension and the classic VC dimension, demonstrating that every hypothesis class that is learnable in the standard PAC setting is also learnable in our setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes