LGJun 27, 2024

Fine-tuned network relies on generic representation to solve unseen cognitive task

arXiv:2406.18926v12.6

Originality Incremental advance

AI Analysis

This addresses the problem of understanding generalization mechanisms in fine-tuned language models for researchers in AI and neuroscience, but it is incremental as it builds on existing pretraining and fine-tuning paradigms.

The study investigated whether fine-tuned language models rely on generic pretrained representations or develop new task-specific solutions for novel cognitive tasks, finding that fine-tuned GPT-2 models heavily depend on pretrained representations in later layers, while models trained from scratch develop more task-specific mechanisms.

Fine-tuning pretrained language models has shown promising results on a wide range of tasks, but when encountering a novel task, do they rely more on generic pretrained representation, or develop brand new task-specific solutions? Here, we fine-tuned GPT-2 on a context-dependent decision-making task, novel to the model but adapted from neuroscience literature. We compared its performance and internal mechanisms to a version of GPT-2 trained from scratch on the same task. Our results show that fine-tuned models depend heavily on pretrained representations, particularly in later layers, while models trained from scratch develop different, more task-specific mechanisms. These findings highlight the advantages and limitations of pretraining for task generalization and underscore the need for further investigation into the mechanisms underpinning task-specific fine-tuning in LLMs.

View on arXiv PDF

Similar