CLAICROct 18, 2024

REEF: Representation Encoding Fingerprints for Large Language Models

arXiv:2410.14273v136 citationsh-index: 10Has CodeICLR
Originality Incremental advance
AI Analysis

This addresses the need for model owners and third parties to detect unauthorized use or derivation of costly LLMs, though it is incremental as it builds on existing representation analysis techniques.

The authors tackled the problem of protecting intellectual property for open-source Large Language Models (LLMs) by proposing REEF, a training-free method that identifies relationships between suspect and victim models using feature representation similarities, achieving robustness to modifications like fine-tuning and pruning without impairing model capabilities.

Protecting the intellectual property of open-source Large Language Models (LLMs) is very important, because training LLMs costs extensive computational resources and data. Therefore, model owners and third parties need to identify whether a suspect model is a subsequent development of the victim model. To this end, we propose a training-free REEF to identify the relationship between the suspect and victim models from the perspective of LLMs' feature representations. Specifically, REEF computes and compares the centered kernel alignment similarity between the representations of a suspect model and a victim model on the same samples. This training-free REEF does not impair the model's general capabilities and is robust to sequential fine-tuning, pruning, model merging, and permutations. In this way, REEF provides a simple and effective way for third parties and models' owners to protect LLMs' intellectual property together. The code is available at https://github.com/tmylla/REEF.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes