CLDec 9, 2024

A Comparative Study of Learning Paradigms in Large Language Models via Intrinsic Dimension

arXiv:2412.06245v213 citationsh-index: 1RepL4NLP
Originality Synthesis-oriented
AI Analysis

This work provides insights into learning mechanisms for researchers in natural language processing, but it is incremental as it analyzes existing methods without introducing new techniques.

This study compared supervised fine-tuning (SFT) and in-context learning (ICL) in large language models using intrinsic dimension (ID) to analyze their effects on hidden representations, finding that ICL consistently induces a higher ID than SFT, indicating representations in higher-dimensional manifolds.

The performance of Large Language Models (LLMs) on natural language tasks can be improved through both supervised fine-tuning (SFT) and in-context learning (ICL), which operate via distinct mechanisms. Supervised fine-tuning updates the model's weights by minimizing loss on training data, whereas in-context learning leverages task demonstrations embedded in the prompt, without changing the model's parameters. This study investigates the effects of these learning paradigms on the hidden representations of LLMs using Intrinsic Dimension (ID). We use ID to estimate the number of degrees of freedom between representations extracted from LLMs as they perform specific natural language tasks. We first explore how the ID of LLM representations evolves during SFT and how it varies due to the number of demonstrations in ICL. We then compare the IDs induced by SFT and ICL and find that ICL consistently induces a higher ID compared to SFT, suggesting that representations generated during ICL reside in higher dimensional manifolds in the embedding space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes