CV LGAug 21, 2023

Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models

Peiyan Zhang, Haoyang Liu, Chaozhuo Li, Xing Xie, Sunghun Kim, Haohan Wang

arXiv:2308.10632v38.48 citationsh-index: 29

Originality Incremental advance

AI Analysis

This work addresses the need for more realistic robustness evaluation in machine learning, though it is incremental as it builds on existing foundation model concepts.

The paper tackles the problem of evaluating image classification model robustness beyond fixed benchmarks by introducing a new measurement that compares model performance to a foundation model as a surrogate oracle, and designs a method to generate perturbed samples within the same image-label structure for evaluation.

Machine learning has demonstrated remarkable performance over finite datasets, yet whether the scores over the fixed benchmarks can sufficiently indicate the model's performance in the real world is still in discussion. In reality, an ideal robust model will probably behave similarly to the oracle (e.g., the human users), thus a good evaluation protocol is probably to evaluate the models' behaviors in comparison to the oracle. In this paper, we introduce a new robustness measurement that directly measures the image classification model's performance compared with a surrogate oracle (i.e., a foundation model). Besides, we design a simple method that can accomplish the evaluation beyond the scope of the benchmarks. Our method extends the image datasets with new samples that are sufficiently perturbed to be distinct from the ones in the original sets, but are still bounded within the same image-label structure the original test image represents, constrained by a foundation model pretrained with a large amount of samples. As a result, our new method will offer us a new way to evaluate the models' robustness performance, free of limitations of fixed benchmarks or constrained perturbations, although scoped by the power of the oracle. In addition to the evaluation results, we also leverage our generated data to understand the behaviors of the model and our new evaluation strategies.

View on arXiv PDF

Similar