CVHCSep 9, 2024

VFA: Vision Frequency Analysis of Foundation Models and Human

arXiv:2409.05817v11 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving out-of-distribution generalization in computer vision models by aligning them with human perception, which is incremental as it builds on existing research.

The study tackled the problem of machine learning models struggling with distribution shifts by investigating how characteristics like model size and multimodality affect alignment with human perception, finding that these factors enhance robustness and show a strong correlation with out-of-distribution accuracy.

Machine learning models often struggle with distribution shifts in real-world scenarios, whereas humans exhibit robust adaptation. Models that better align with human perception may achieve higher out-of-distribution generalization. In this study, we investigate how various characteristics of large-scale computer vision models influence their alignment with human capabilities and robustness. Our findings indicate that increasing model and data size and incorporating rich semantic information and multiple modalities enhance models' alignment with human perception and their overall robustness. Our empirical analysis demonstrates a strong correlation between out-of-distribution accuracy and human alignment.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes