Making Models Unmergeable via Scaling-Sensitive Loss Landscape
This addresses a governance gap in model hubs for developers and organizations needing to protect released models from unauthorized recomposition, though it is an incremental improvement over existing defenses.
The paper tackles the problem of unauthorized model merging that bypasses safety alignment or licensing by proposing Trap^2, an architecture-agnostic protection framework that degrades model performance under weight re-scaling during merging, while keeping weights effective for standalone use.
The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a \emph{governance gap}: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose \textsc{Trap}$^{2}$, an architecture-agnostic protection framework that encodes protection into the update during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, \textsc{Trap}$^{2}$ uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized merging.