CV HCSep 9, 2024

VFA: Vision Frequency Analysis of Foundation Models and Human

Mohammad-Javad Darvishi-Bayazi, Md Rifat Arefin, Jocelyn Faubert, Irina Rish

arXiv:2409.05817v13.71 citationsh-index: 38

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving out-of-distribution generalization in computer vision models by aligning them with human perception, which is incremental as it builds on existing research.

The study tackled the problem of machine learning models struggling with distribution shifts by investigating how characteristics like model size and multimodality affect alignment with human perception, finding that these factors enhance robustness and show a strong correlation with out-of-distribution accuracy.

Machine learning models often struggle with distribution shifts in real-world scenarios, whereas humans exhibit robust adaptation. Models that better align with human perception may achieve higher out-of-distribution generalization. In this study, we investigate how various characteristics of large-scale computer vision models influence their alignment with human capabilities and robustness. Our findings indicate that increasing model and data size and incorporating rich semantic information and multiple modalities enhance models' alignment with human perception and their overall robustness. Our empirical analysis demonstrates a strong correlation between out-of-distribution accuracy and human alignment.

View on arXiv PDF

Similar