CVNov 28, 2023

LLaFS: When Large Language Models Meet Few-Shot Segmentation

arXiv:2311.16926v588 citationsh-index: 25
Originality Highly original
AI Analysis

This addresses the problem of data scarcity in segmentation for computer vision researchers, representing a novel paradigm shift rather than an incremental improvement.

The paper tackles few-shot segmentation by leveraging large language models (LLMs) as a supplement to limited annotated data, achieving state-of-the-art results on multiple datasets.

This paper proposes LLaFS, the first attempt to leverage large language models (LLMs) in few-shot segmentation. In contrast to the conventional few-shot segmentation methods that only rely on the limited and biased information from the annotated support images, LLaFS leverages the vast prior knowledge gained by LLM as an effective supplement and directly uses the LLM to segment images in a few-shot manner. To enable the text-based LLM to handle image-related tasks, we carefully design an input instruction that allows the LLM to produce segmentation results represented as polygons, and propose a region-attribute table to simulate the human visual mechanism and provide multi-modal guidance. We also synthesize pseudo samples and use curriculum learning for pretraining to augment data and achieve better optimization. LLaFS achieves state-of-the-art results on multiple datasets, showing the potential of using LLMs for few-shot computer vision tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes