CVMar 23, 2023

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

arXiv:2303.13009v138 citationsh-index: 22Has Code
Originality Highly original
AI Analysis

This work addresses the challenge of improving fine-tuning efficiency for video foundation models, which is incremental as it builds on existing models with a novel loss combination method.

The authors tackled the problem of fine-tuning video foundation models by proposing MELTR, a plug-in module that automatically combines multiple loss functions, resulting in significant performance gains across four downstream tasks including text-to-video retrieval and video question answering.

Foundation models have shown outstanding performance and generalization capabilities across domains. Since most studies on foundation models mainly focus on the pretraining phase, a naive strategy to minimize a single task-specific loss is adopted for fine-tuning. However, such fine-tuning methods do not fully leverage other losses that are potentially beneficial for the target task. Therefore, we propose MEta Loss TRansformer (MELTR), a plug-in module that automatically and non-linearly combines various loss functions to aid learning the target task via auxiliary learning. We formulate the auxiliary learning as a bi-level optimization problem and present an efficient optimization algorithm based on Approximate Implicit Differentiation (AID). For evaluation, we apply our framework to various video foundation models (UniVL, Violet and All-in-one), and show significant performance gain on all four downstream tasks: text-to-video retrieval, video question answering, video captioning, and multi-modal sentiment analysis. Our qualitative analyses demonstrate that MELTR adequately `transforms' individual loss functions and `melts' them into an effective unified loss. Code is available at https://github.com/mlvlab/MELTR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes