LGDec 14, 2023

Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning

Avelina Asada Hadji-Kyriacou, Ognjen Arandjelovic

arXiv:2312.08900v13.83 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work addresses the challenge of parameter and computational efficiency in multi-modal fine-tuning for researchers and practitioners, though it appears incremental as it builds on existing PEFT techniques like LoRA.

The paper tackles the problem of inefficient fine-tuning for multi-modal, multi-task learning with pre-trained language models by proposing Context-PEFT, a parameter-efficient framework that learns domain-specific adaptor groups, achieving better performance than full fine-tuning on the COCO captioning task with fewer parameters and lower computational cost.

This paper introduces a novel Parameter-Efficient Fine-Tuning (PEFT) framework for multi-modal, multi-task transfer learning with pre-trained language models. PEFT techniques such as LoRA, BitFit and IA3 have demonstrated comparable performance to full fine-tuning of pre-trained models for specific downstream tasks, all while demanding significantly fewer trainable parameters and reduced GPU memory consumption. However, in the context of multi-modal fine-tuning, the need for architectural modifications or full fine-tuning often becomes apparent. To address this we propose Context-PEFT, which learns different groups of adaptor parameters based on the token's domain. This approach enables LoRA-like weight injection without requiring additional architectural changes. Our method is evaluated on the COCO captioning task, where it outperforms full fine-tuning under similar data constraints while simultaneously offering a substantially more parameter-efficient and computationally economical solution.

View on arXiv PDF

Similar