CLAICRJun 15, 2023

Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large Language Models

arXiv:2306.09308v1224 citationsh-index: 23
Originality Incremental advance
AI Analysis

This addresses model provenance and accountability for stakeholders in AI deployment, though it is incremental as it builds on existing attribution concepts.

The paper tackles the problem of tracing fine-tuned large language models back to their pre-trained base models to address issues like license violations and accountability, achieving correct attribution for 8 out of 10 models with their best method.

The wide applicability and adaptability of generative large language models (LLMs) has enabled their rapid adoption. While the pre-trained models can perform many tasks, such models are often fine-tuned to improve their performance on various downstream applications. However, this leads to issues over violation of model licenses, model theft, and copyright infringement. Moreover, recent advances show that generative technology is capable of producing harmful content which exacerbates the problems of accountability within model supply chains. Thus, we need a method to investigate how a model was trained or a piece of text was generated and what their pre-trained base model was. In this paper we take the first step to address this open problem by tracing back the origin of a given fine-tuned LLM to its corresponding pre-trained base model. We consider different knowledge levels and attribution strategies, and find that we can correctly trace back 8 out of the 10 fine tuned models with our best method.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes