CV AI LGFeb 21, 2024

Zero-shot generalization across architectures for visual classification

Evan Gerritz, Luciano Dyballa, Steven W. Zucker

arXiv:2402.14095v43.71 citationsh-index: 60Has CodeTiny Papers @ ICLR

Originality Synthesis-oriented

AI Analysis

This addresses the problem of understanding generalization in deep learning for researchers, but it is incremental as it builds on existing work without introducing new methods.

The study investigated the relationship between classification accuracy and generalization to unseen classes in deep networks, finding that accuracy does not predict generalizability and that generalization varies non-monotonically with layer depth across architectures like CNNs and transformers.

Generalization to unseen data is a key desideratum for deep networks, but its relation to classification accuracy is unclear. Using a minimalist vision dataset and a measure of generalizability, we show that popular networks, from deep convolutional networks (CNNs) to transformers, vary in their power to extrapolate to unseen classes both across layers and across architectures. Accuracy is not a good predictor of generalizability, and generalization varies non-monotonically with layer depth.

View on arXiv PDF Code

Similar