CVAIGRLGJun 4, 2024

Learning to Edit Visual Programs with Self-Supervision

arXiv:2406.02383v27 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving visual program accuracy in domains lacking program annotations, offering a novel editing-based paradigm that is incremental but provides specific gains.

The paper tackles the problem of editing visual programs to better match visual targets, introducing a self-supervised learning approach that combines an edit network with a one-shot prediction model, resulting in more accurate visual programs across multiple domains with significant advantages over using only the one-shot model under equal search-time budgets.

We design a system that learns how to edit visual programs. Our edit network consumes a complete input program and a visual target. From this input, we task our network with predicting a local edit operation that could be applied to the input program to improve its similarity to the target. In order to apply this scheme for domains that lack program annotations, we develop a self-supervised learning approach that integrates this edit network into a bootstrapped finetuning loop along with a network that predicts entire programs in one-shot. Our joint finetuning scheme, when coupled with an inference procedure that initializes a population from the one-shot model and evolves members of this population with the edit network, helps to infer more accurate visual programs. Over multiple domains, we experimentally compare our method against the alternative of using only the one-shot model, and find that even under equal search-time budgets, our editing-based paradigm provides significant advantages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes