CVLGROFeb 12, 2024

Task-conditioned adaptation of visual features in multi-task policy learning

arXiv:2402.07739v49 citationsh-index: 16CVPR
Originality Incremental advance
AI Analysis

This work addresses the challenge of multi-task policy learning for autonomous agents, offering a novel approach to task-conditioned visual adaptation, though it appears incremental by building on pre-trained models and existing benchmarks.

The paper tackles the problem of enabling autonomous agents to address multiple tasks by adapting visual perception modules conditioned on specific tasks, showing that the method can handle a wide variety of tasks with a single policy and generalize to unseen tasks given a few demonstrations.

Successfully addressing a wide variety of tasks is a core ability of autonomous agents, requiring flexibly adapting the underlying decision-making strategies and, as we argue in this work, also adapting the perception modules. An analogical argument would be the human visual system, which uses top-down signals to focus attention determined by the current task. Similarly, we adapt pre-trained large vision models conditioned on specific downstream tasks in the context of multi-task policy learning. We introduce task-conditioned adapters that do not require finetuning any pre-trained weights, combined with a single policy trained with behavior cloning and capable of addressing multiple tasks. We condition the visual adapters on task embeddings, which can be selected at inference if the task is known, or alternatively inferred from a set of example demonstrations. To this end, we propose a new optimization-based estimator. We evaluate the method on a wide variety of tasks from the CortexBench benchmark and show that, compared to existing work, it can be addressed with a single policy. In particular, we demonstrate that adapting visual features is a key design choice and that the method generalizes to unseen tasks given a few demonstrations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes