CVCLDec 11, 2024

Visual Program Distillation with Template-Based Augmentation

arXiv:2412.08564v42 citationsh-index: 5EMNLP
Originality Incremental advance
AI Analysis

This work addresses the problem of high annotation and inference costs in visual programming for specialized domains, offering an incremental improvement through synthetic data augmentation.

The paper tackles the challenge of adapting visual programming for specialized visual tasks by proposing a low-cost distillation method that requires no human annotations, achieving high-quality program generation with small language models and faster inference.

Adapting visual programming or prompting large language models (LLMs) to generate executable code for visual tasks like visual question answering (VQA) for specialized tasks or domains remains challenging due to high annotation and inference costs. We propose a low-cost visual program distillation method that can be used for models with at most 1 billion parameters and requires no human-generated program annotations. We achieve this through synthetic data augmentation based on decoupling programs into higher-level skills, called templates, and their corresponding arguments. Experimental results show that, with a relatively small amount of question/answer data, small language models can generate high-quality specialized visual programs with the added benefit of much faster inference

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes