CVAICYAug 5, 2021

Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications

arXiv:2108.02818v1170 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for broader evaluation of AI models beyond accuracy, focusing on deployment-critical features like bias and safety, which is incremental as it builds on existing calls for change in model assessment.

The paper analyzes CLIP, a generalizable computer vision model, highlighting its ability to reduce task-specific training data and flexibly specify classes in natural language, while also finding that it can inherit biases from prior systems, raising safety concerns for deployment.

Recently, there have been breakthroughs in computer vision ("CV") models that are more generalizable with the advent of models such as CLIP and ALIGN. In this paper, we analyze CLIP and highlight some of the challenges such models pose. CLIP reduces the need for task specific training data, potentially opening up many niche tasks to automation. CLIP also allows its users to flexibly specify image classification classes in natural language, which we find can shift how biases manifest. Additionally, through some preliminary probes we find that CLIP can inherit biases found in prior computer vision systems. Given the wide and unpredictable domain of uses for such models, this raises questions regarding what sufficiently safe behaviour for such systems may look like. These results add evidence to the growing body of work calling for a change in the notion of a 'better' model--to move beyond simply looking at higher accuracy at task-oriented capability evaluations, and towards a broader 'better' that takes into account deployment-critical features such as different use contexts, and people who interact with the model when thinking about model deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes