SI-Score: An image dataset for fine-grained analysis of robustness to object location, rotation and size
This work addresses the need for robustness evaluation in image understanding models, though it is incremental as it focuses on specific factors of variation without broad SOTA impact.
The authors tackled the problem of assessing robustness of deep learning models to variations in object location, rotation, and size by introducing SI-Score, a synthetic dataset for fine-grained analysis, and found qualitative differences among ResNets, Vision Transformers, and CLIP.
Before deploying machine learning models it is critical to assess their robustness. In the context of deep neural networks for image understanding, changing the object location, rotation and size may affect the predictions in non-trivial ways. In this work we perform a fine-grained analysis of robustness with respect to these factors of variation using SI-Score, a synthetic dataset. In particular, we investigate ResNets, Vision Transformers and CLIP, and identify interesting qualitative differences between these.