CRAIJan 13

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

arXiv:2601.08223v13 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the need for stealthy and resilient ownership verification for LLM developers, though it appears incremental as it builds on existing backdoor-based fingerprinting approaches.

The paper tackles the problem of intellectual property protection for large language models under black-box deployment by proposing DNF, a dual-layer nested fingerprinting method that achieves perfect fingerprint activation across multiple models while preserving utility.

The rapid growth of large language models raises pressing concerns about intellectual property protection under black-box deployment. Existing backdoor-based fingerprints either rely on rare tokens -- leading to high-perplexity inputs susceptible to filtering -- or use fixed trigger-response mappings that are brittle to leakage and post-hoc adaptation. We propose \textsc{Dual-Layer Nested Fingerprinting} (DNF), a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers. Across Mistral-7B, LLaMA-3-8B-Instruct, and Falcon3-7B-Instruct, DNF achieves perfect fingerprint activation while preserving downstream utility. Compared with existing methods, it uses lower-perplexity triggers, remains undetectable under fingerprint detection attacks, and is relatively robust to incremental fine-tuning and model merging. These results position DNF as a practical, stealthy, and resilient solution for LLM ownership verification and intellectual property protection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes