CLLGOct 25, 2024

Layer by Layer: Uncovering Where Multi-Task Learning Happens in Instruction-Tuned Large Language Models

arXiv:2410.20008v136 citationsh-index: 13EMNLP
AI Analysis

This work addresses the problem of understanding how LLMs organize knowledge for researchers, providing insights into multi-task learning mechanisms, though it is incremental in extending existing analysis methods.

The study investigated where task-specific knowledge is stored in large language models (LLMs) before and after instruction tuning across over 60 NLP tasks, finding that some tasks are already encoded in pre-trained models while others benefit from tuning, and identified layers where representations shift from general to task-oriented.

Fine-tuning pre-trained large language models (LLMs) on a diverse array of tasks has become a common approach for building models that can solve various natural language processing (NLP) tasks. However, where and to what extent these models retain task-specific knowledge remains largely unexplored. This study investigates the task-specific information encoded in pre-trained LLMs and the effects of instruction tuning on their representations across a diverse set of over 60 NLP tasks. We use a set of matrix analysis tools to examine the differences between the way pre-trained and instruction-tuned LLMs store task-specific information. Our findings reveal that while some tasks are already encoded within the pre-trained LLMs, others greatly benefit from instruction tuning. Additionally, we pinpointed the layers in which the model transitions from high-level general representations to more task-oriented representations. This finding extends our understanding of the governing mechanisms of LLMs and facilitates future research in the fields of parameter-efficient transfer learning and multi-task learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes