CV AI CLMay 11, 2023

Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts

Zhaoyang Zhang, Yantao Shen, Kunyu Shi, Zhaowei Cai, Jun Fang, Siqi Deng, Hao Yang, Davide Modolo, Zhuowen Tu, Stefano Soatto

arXiv:2305.07019v25.03 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of building a single, unified model for multiple vision-language tasks, which is incremental as it builds on existing multi-task learning approaches.

The researchers tackled the problem of multi-task vision-language model training where heterogeneous tasks interfere with each other, and achieved results comparable to or better than strong single-task baselines across multiple tasks.

We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks which may interfere with each other, resulting in a single model which we named Musketeer. The integration of knowledge across heterogeneous tasks is enabled by a novel feature called Task Explanation Prompt (TEP). With rich and structured information such as task input/output format, TEP reduces interference among tasks, allowing the model to focus on their shared structure. With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.

View on arXiv PDF Code

Similar