GIST: Improving Parameter Efficient Fine Tuning via Knowledge Interaction
This addresses inefficiencies in fine-tuning large pre-trained models for downstream tasks, though it appears incremental as it builds on existing PEFT methods like Adapter.
The paper tackles the problem of Parameter-Efficient Fine-Tuning (PEFT) methods overlooking explicit associations between trainable parameters and downstream task knowledge, as well as neglecting interaction between task-agnostic and task-specific knowledge, by proposing the GIST framework with a Gist token and Knowledge Interaction objective. The result is a 2.25% performance boost on the VTAB-1K benchmark using Adapter with only 0.8K additional parameters.
The Parameter-Efficient Fine-Tuning (PEFT) method, which adjusts or introduces fewer trainable parameters to calibrate pre-trained models on downstream tasks, has become a recent research interest. However, existing PEFT methods within the traditional fine-tiuning framework have two main shortcomings: 1) They overlook the explicit association between trainable parameters and downstream task knowledge. 2) They neglect the interaction between the intrinsic task-agnostic knowledge of pre-trained models and the task-specific knowledge in downstream tasks. To address this gap, we propose a novel fine-tuning framework, named GIST, in a plug-and-play manner. Specifically, our framework first introduces a trainable token, called the Gist token, when applying PEFT methods on downstream tasks. This token serves as an aggregator of the task-specific knowledge learned by the PEFT methods and forms an explicit association with downstream knowledge. Furthermore, to facilitate explicit interaction between task-agnostic and task-specific knowledge, we introduce the concept of Knowledge Interaction via a Bidirectional Kullback-Leibler Divergence objective. As a result, PEFT methods within our framework can make the pre-trained model understand downstream tasks more comprehensively by leveraging the knowledge interaction. Extensive experiments demonstrate the universality and scalability of our framework. Notably, on the VTAB-1K benchmark, we employ the Adapter (a prevalent PEFT method) within our GIST framework and achieve a performance boost of 2.25%, with an increase of only 0.8K parameters. The Code will be released.