CVAICLMay 11, 2023

Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts

arXiv:2305.07019v23 citations
AI Analysis

This addresses the challenge of building a single, unified model for multiple vision-language tasks, which is incremental as it builds on existing multi-task learning approaches.

The researchers tackled the problem of multi-task vision-language model training where heterogeneous tasks interfere with each other, and achieved results comparable to or better than strong single-task baselines across multiple tasks.

We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks which may interfere with each other, resulting in a single model which we named Musketeer. The integration of knowledge across heterogeneous tasks is enabled by a novel feature called Task Explanation Prompt (TEP). With rich and structured information such as task input/output format, TEP reduces interference among tasks, allowing the model to focus on their shared structure. With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes