How well does CLIP understand texture?
This work addresses the problem of evaluating CLIP's texture comprehension for researchers in computer vision and AI, but it is incremental as it applies existing methods to new data without introducing novel techniques.
The paper investigates CLIP's understanding of texture in natural images by analyzing its zero-shot learning on texture classification datasets, representation of compositional properties like red dots or yellow stripes, and aid in fine-grained bird categorization based on color and texture descriptions.
We investigate how well CLIP understands texture in natural images described by natural language. To this end, we analyze CLIP's ability to: (1) perform zero-shot learning on various texture and material classification datasets; (2) represent compositional properties of texture such as red dots or yellow stripes on the Describable Texture in Detail(DTDD) dataset; and (3) aid fine-grained categorization of birds in photographs described by color and texture of their body parts.