CVLGNov 24, 2025

Understanding Task Transfer in Vision-Language Models

arXiv:2511.18787v11 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient task-specific finetuning for VLMs, offering actionable guidance to mitigate negative interference and leverage positive transfer, though it is incremental as it builds on existing VLM capabilities.

The paper tackled the challenge of unpredictable performance changes when finetuning Vision-Language Models (VLMs) on visual perception tasks, by systematically studying task transferability and introducing the Perfection Gap Factor (PGF) metric to quantify effects, revealing patterns of positive and negative transfer across 13 tasks.

Vision-Language Models (VLMs) perform well on multimodal benchmarks but lag behind humans and specialized models on visual perception tasks like depth estimation or object counting. Finetuning on one task can unpredictably affect performance on others, making task-specific finetuning challenging. In this paper, we address this challenge through a systematic study of task transferability. We examine how finetuning a VLM on one perception task affects its zero-shot performance on others. To quantify these effects, we introduce Perfection Gap Factor (PGF), a metric that captures both the breadth and magnitude of transfer. Using three open-weight VLMs evaluated across 13 perception tasks, we construct a task-transfer graph that reveals previously unobserved relationships among perception tasks. Our analysis uncovers patterns of positive and negative transfer, identifies groups of tasks that mutually influence each other, organizes tasks into personas based on their transfer behavior and demonstrates how PGF can guide data selection for more efficient training. These findings highlight both opportunities for positive transfer and risks of negative interference, offering actionable guidance for advancing VLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes