CVAILGFeb 21, 2024

Zero-shot generalization across architectures for visual classification

arXiv:2402.14095v41 citationsh-index: 5Tiny Papers @ ICLR
Originality Synthesis-oriented
AI Analysis

This addresses the problem of understanding generalization in deep learning for researchers, but it is incremental as it builds on existing work without introducing new methods.

The study investigated the relationship between classification accuracy and generalization to unseen classes in deep networks, finding that accuracy does not predict generalizability and that generalization varies non-monotonically with layer depth across architectures like CNNs and transformers.

Generalization to unseen data is a key desideratum for deep networks, but its relation to classification accuracy is unclear. Using a minimalist vision dataset and a measure of generalizability, we show that popular networks, from deep convolutional networks (CNNs) to transformers, vary in their power to extrapolate to unseen classes both across layers and across architectures. Accuracy is not a good predictor of generalizability, and generalization varies non-monotonically with layer depth.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes