CRAIDec 19, 2025

Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models

arXiv:2512.17519v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of preventing unauthorized use of fine-tuned models for practitioners, though it is an incremental improvement in access control methods.

The paper tackles the problem of unauthorized access to instruction-tuned language models by proposing K-OTG, a secret-key access control mechanism that uses hidden-state scrambling. The result shows authorized utility remains close to base performance (e.g., XSum ROUGE/BLEU preserved), while unauthorized utility collapses to near-zero metrics with 40% runtime overhead.

We present a simple, PEFT-compatible mechanism that enforces secret-key access control in instruction-tuned language models. K-OTG trains on a dual-path corpus: authorized examples (prefixed with a role key) learn the task output, while unauthorized examples learn a visible block token. At inference, a pre-lm_head hook applies an orthonormal transform to the hidden state: with the correct key/role the inverse map restores the model's native basis; otherwise a session-ephemeral scrambler (permutation, sign flips, Householders) makes logits uninformative and the system short-circuits to BLOCK. Keys are not added as special tokens, and the method composes cleanly with LoRA on 4-bit bases. We evaluate an hour-scale protocol on 1-3B-class instruction models (Llama 3.2, Qwen2.5 1.5B) across utility (XSum ROUGE/BLEU, GSM8K accuracy, WikiText-2 perplexity), selectivity (3by3 role-key unlock matrices), nonce invariance, block suppression, and throughput. Authorized utility remains close to the base on summarization with the expected modest PPL increase from instruction tuning; unauthorized utility collapses (near-zero sequence metrics with exploding PPL), indicating practical unusability without the key. Unlock matrices are diagonally dominant (high on-target unlock, low cross-unlock), authorized block emission is 0 per N via robust bad-word lists, and greedy outputs match exactly across nonces, confirming correct inverse cancellation. The runtime overhead of the Python-level hook is 40% tokens per sec versus the base. K-OTG therefore provides a pragmatic, model-agnostic way to prevent unauthorized use while preserving authorized utility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes