AnchorSteer: Self-Discovered Concept Injection for Structure-Preserving Music Editing
This work is significant for music producers and researchers seeking to edit music with precise control over semantic attributes without degrading the underlying structure, offering an incremental improvement over existing methods.
The paper addresses the challenge of controllable music editing, where high-level attributes are modified while preserving rhythmic and melodic structures. The proposed AnchorSteer framework disentangles semantic and structural aspects, outperforming steering-only and anchoring-only baselines in experiments on ZoME-Bench and subjective tests, enabling significant semantic transformations with high-fidelity structural preservation.
Controllable music editing is to modify high-level attributes while strictly preserving rhythmic and melodic structures. However, this task is challenged by a semantic-structural entanglement: steering methods often degrade structure to achieve editing performance, while structural adaptors suppress semantic responsiveness. We propose AnchorSteer, a framework that disentangles this tension by coupling structural anchoring with self-discovered semantic steering. The proposed approach probes internal representations to extract interpretable, label-free concept vectors via a self-supervised reconstruction objective, isolating attributes without curated data. During editing, these portable, plug-and-play concept vectors are injected into diffusion hidden manifolds while a structural adaptor enforces consistency. Variants for unconditioned and conditioned injections are provided to balance robustness and semantic strength. Experiments on ZoME-Bench and subjective tests show that the proposed framework outperforms both steering-only and anchoring-only baselines, enabling significant semantic transformations with high-fidelity structural preservation.