LG AI MLFeb 18, 2025

Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models

Daiki Chijiwa, Taku Hasegawa, Kyosuke Nishida, Kuniko Saito, Susumu Takeuchi

arXiv:2502.12776v17.11 citationsh-index: 7ICML

Originality Incremental advance

AI Analysis

This addresses the cost and efficiency issue for users of foundation models who need to update models over time, though it is incremental as it builds on existing fine-tuning and inference-time tuning approaches.

The paper tackles the problem of repeated fine-tuning costs when replacing outdated foundation models by proposing Portable Reward Tuning (PRT), which trains a reward model instead of fine-tuning parameters, achieving comparable accuracy to existing methods with reduced inference overhead in vision and language tasks.

While foundation models have been exploited for various expert tasks through fine-tuning, any foundation model will become outdated due to its old knowledge or limited capability. Thus the underlying foundation model should be eventually replaced by new ones, which leads to repeated cost of fine-tuning these new models. Existing work addresses this problem by inference-time tuning, i.e., modifying the output probabilities from the new foundation model with the outputs from the old foundation model and its fine-tuned model, which involves an additional overhead in inference by the latter two models. In this paper, we propose a new fine-tuning principle, Portable Reward Tuning (PRT), that reduces the inference overhead by its nature, based on the reformulation of fine-tuning as the reward maximization. Specifically, instead of fine-tuning parameters of the foundation models, PRT trains the reward model explicitly through the same loss function as in fine-tuning. During inference, the reward model can be used with any foundation model (with the same set of vocabularies or labels) through the formulation of reward maximization. Experimental results, covering both vision and language models, demonstrate that the PRT-trained model can achieve comparable accuracy to the existing work of inference-time tuning, with less inference cost.

View on arXiv PDF

Similar