CRSep 18, 2024
Training with Differential Privacy: A Gradient-Preserving Noise Reduction Approach with Provable SecurityHaodi Wang, Tangyu Jiang, Yu Guo et al.
Deep learning models have been extensively adopted in various regions due to their ability to represent hierarchical features, which highly rely on the training set and procedures. Thus, protecting the training process and deep learning algorithms is paramount in privacy preservation. Although Differential Privacy (DP) as a powerful cryptographic primitive has achieved satisfying results in deep learning training, the existing schemes still fall short in preserving model utility, i.e., they either invoke a high noise scale or inevitably harm the original gradients. To address the above issues, in this paper, we present a more robust and provably secure approach for differentially private training called GReDP. Specifically, we compute the model gradients in the frequency domain and adopt a new approach to reduce the noise level. Unlike previous work, our GReDP only requires half of the noise scale compared to DPSGD [1] while keeping all the gradient information intact. We present a detailed analysis of our method both theoretically and empirically. The experimental results show that our GReDP works consistently better than the baselines on all models and training settings.
CVJun 4, 2025Code
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle MatricesHao Yu, Tangyu Jiang, Shuning Jia et al.
The Transformer architecture has revolutionized various regions since it was proposed, and its effectiveness largely depends on the ability to encode positional information. Traditional position encoding methods exhibit significant limitations due to lack of robustness and flexibility of position. Therefore, Rotary Positional Encoding (RoPE) was proposed to alleviate these issues, which integrates positional information by rotating the embeddings in the attention mechanism. However, RoPE requires manually defined rotation matrices with limited transformation space, constraining the model's capacity. In this work, we propose ComRoPE, which generalizes RoPE by defining it in terms of trainable commuting angle matrices. Specifically, we demonstrate that pairwise commutativity of these matrices is essential for RoPE to achieve scalability and positional robustness. We formally define the RoPE Equation, which is an essential condition that ensures consistent performance with position offsets. Based on the theoretical analysis, we present two types of trainable commuting angle matrices as sufficient solutions to the RoPE equation, which significantly improve performance, surpassing the current state-of-the-art method by 1.6% at training resolution and 2.9% at higher resolution on the ImageNet-1K dataset. Furthermore, our framework shows versatility in generalizing to existing RoPE formulations and offering new insights for future positional encoding research. To ensure reproducibility, the source code and instructions are available at https://github.com/Longin-Yu/ComRoPE
CVMay 19, 2025Code
Safe-Sora: Safe Text-to-Video Generation via Graphical WatermarkingZihan Su, Xuerui Qiu, Hongbin Xu et al.
The explosive growth of generative video models has amplified the demand for reliable copyright preservation of AI-generated content. Despite its popularity in image synthesis, invisible generative watermarking remains largely underexplored in video generation. To address this gap, we propose Safe-Sora, the first framework to embed graphical watermarks directly into the video generation process. Motivated by the observation that watermarking performance is closely tied to the visual similarity between the watermark and cover content, we introduce a hierarchical coarse-to-fine adaptive matching mechanism. Specifically, the watermark image is divided into patches, each assigned to the most visually similar video frame, and further localized to the optimal spatial region for seamless embedding. To enable spatiotemporal fusion of watermark patches across video frames, we develop a 3D wavelet transform-enhanced Mamba architecture with a novel spatiotemporal local scanning strategy, effectively modeling long-range dependencies during watermark embedding and retrieval. To the best of our knowledge, this is the first attempt to apply state space models to watermarking, opening new avenues for efficient and robust watermark protection. Extensive experiments demonstrate that Safe-Sora achieves state-of-the-art performance in terms of video quality, watermark fidelity, and robustness, which is largely attributed to our proposals. Code is publicly available at https://github.com/Sugewud/Safe-Sora
CVFeb 13, 2025
DiffoRA: Enabling Parameter-Efficient Fine-Tuning via Differential Module SelectionTangyu Jiang, Haodi Wang, Chun Yuan
The Parameter-Efficient Fine-Tuning (PEFT) methods have been extensively researched for large language models in downstream tasks. Among all the existing approaches, the Low-Rank Adaptation (LoRA) has gained popularity for its streamlined design by incorporating low-rank matrices into existing pre-trained models. Though effective, LoRA, as well as its adaptive optimizations, either allocate the same matrix to all the modules or adjust the interior rank of the components based on importance scoring indicators. In this paper, we argue that not all the modules in LLMs are suitable and necessary to be fine-tuned. Enlightened by this insight, we propose a new PEFT scheme called DiffoRA, which enables adaptive adoption of the low-rank decomposition matrices. At the core of DiffoRA lies a Differential Adaptation Matrix (DAM) to determine which module is the most suitable and essential for fine-tuning. We theoretically explain how the designed matrix impacts the convergence rate and generalization capability of a pre-trained model. We then construct the DAM via continuous relaxation and discretization with weight-sharing optimizations. We fully implement DiffoRA and design comprehensive experiments to evaluate its performance. The experimental results demonstrate that DiffoRA delivers state-of-the-art results across multiple benchmarks.