CV AI LGMay 27, 2025

In Context Learning with Vision Transformers: Case Study

Antony Zhao, Alex Proshkin, Fergal Hennessy, Francesco Crivelli

arXiv:2505.20872v13.6

Originality Synthesis-oriented

AI Analysis

This is an incremental study focusing on image processing tasks for AI researchers.

The paper investigates whether vision transformers can perform in-context learning for complex image functions like convolutional neural networks, extending prior work on simpler functions in random data.

Large transformer models have been shown to be capable of performing in-context learning. By using examples in a prompt as well as a query, they are capable of performing tasks such as few-shot, one-shot, or zero-shot learning to output the corresponding answer to this query. One area of interest to us is that these transformer models have been shown to be capable of learning the general class of certain functions, such as linear functions and small 2-layer neural networks, on random data (Garg et al, 2023). We aim to extend this to the image space to analyze their capability to in-context learn more complex functions on the image space, such as convolutional neural networks and other methods.

View on arXiv PDF

Similar