CVAIJul 3, 2023

Review of Large Vision Models and Visual Prompt Engineering

arXiv:2307.00855v1236 citationsh-index: 61
Originality Synthesis-oriented
AI Analysis

It provides a comprehensive overview for researchers working on visual AI, but is incremental as it reviews existing work.

This review summarizes methods for large vision models and visual prompt engineering, exploring advancements to achieve zero-shot capabilities in computer vision tasks.

Visual prompt engineering is a fundamental technology in the field of visual and image Artificial General Intelligence, serving as a key component for achieving zero-shot capabilities. As the development of large vision models progresses, the importance of prompt engineering becomes increasingly evident. Designing suitable prompts for specific visual tasks has emerged as a meaningful research direction. This review aims to summarize the methods employed in the computer vision domain for large vision models and visual prompt engineering, exploring the latest advancements in visual prompt engineering. We present influential large models in the visual domain and a range of prompt engineering methods employed on these models. It is our hope that this review provides a comprehensive and systematic description of prompt engineering methods based on large visual models, offering valuable insights for future researchers in their exploration of this field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes