Mu Zhang

AI
h-index7
3papers
62citations
Novelty60%
AI Score38

3 Papers

CVDec 11, 2024
CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis

Mu Zhang, Yunfan Liu, Yue Liu et al.

Existing image synthesis methods for natural scenes focus primarily on foreground control, often reducing the background to simplistic textures. Consequently, these approaches tend to overlook the intrinsic correlation between foreground and background, which may lead to incoherent and unrealistic synthesis results in remote sensing (RS) scenarios. In this paper, we introduce CC-Diff, a $\underline{\textbf{Diff}}$usion Model-based approach for RS image generation with enhanced $\underline{\textbf{C}}$ontext $\underline{\textbf{C}}$oherence. Specifically, we propose a novel Dual Re-sampler for feature extraction, with a built-in `Context Bridge' to explicitly capture the intricate interdependency between foreground and background. Moreover, we reinforce their connection by employing a foreground-aware attention mechanism during the generation of background features, thereby enhancing the plausibility of the synthesized context. Extensive experiments show that CC-Diff outperforms state-of-the-art methods across critical quality metrics, excelling in the RS domain and effectively generalizing to natural images. Remarkably, CC-Diff also shows high trainability, boosting detection accuracy by 1.83 mAP on DOTA and 2.25 mAP on the COCO benchmark.

AIOct 3, 2025
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models

Tianren Ma, Mu Zhang, Yibing Wang et al.

Optimizing discrete diffusion model (DDM) with rewards remains a challenge: the non-autoregressive paradigm makes importance sampling intractable and rollout complex, puzzling reinforcement learning methods such as Group Relative Policy Optimization (GRPO). In this study, we introduce MaskGRPO, the first viable approach to enable scalable multimodal reinforcement learning in discrete diffusion with effective importance sampling and modality-specific adaptations. To this end, we first clarify the theoretical foundation for DDMs, which facilitates building an importance estimator that captures valuable token fluctuation for gradient updates. We then delicately tailored the rollout method for visual sequences, which yields diverse completions and reliable optimization gradients. Upon math reasoning, coding, and visual generation benchmarks, MaskGRPO brings more stable and efficient updates, leading to stronger reasoning performance and better generation quality. This study establishes MaskGRPO as a systematic policy optimization approach and the first practical way for discretized visual diffusion.

PLSep 5, 2019
Duet: An Expressive Higher-order Language and Linear Type System for Statically Enforcing Differential Privacy

Joseph P. Near, David Darais, Chike Abuah et al.

During the past decade, differential privacy has become the gold standard for protecting the privacy of individuals. However, verifying that a particular program provides differential privacy often remains a manual task to be completed by an expert in the field. Language-based techniques have been proposed for fully automating proofs of differential privacy via type system design, however these results have lagged behind advances in differentially-private algorithms, leaving a noticeable gap in programs which can be automatically verified while also providing state-of-the-art bounds on privacy. We propose Duet, an expressive higher-order language, linear type system and tool for automatically verifying differential privacy of general-purpose higher-order programs. In addition to general purpose programming, Duet supports encoding machine learning algorithms such as stochastic gradient descent, as well as common auxiliary data analysis tasks such as clipping, normalization and hyperparameter tuning - each of which are particularly challenging to encode in a statically verified differential privacy framework. We present a core design of the Duet language and linear type system, and complete key proofs about privacy for well-typed programs. We then show how to extend Duet to support realistic machine learning applications and recent variants of differential privacy which result in improved accuracy for many practical differentially private algorithms. Finally, we implement several differentially private machine learning algorithms in Duet which have never before been automatically verified by a language-based tool, and we present experimental results which demonstrate the benefits of Duet's language design in terms of accuracy of trained machine learning models.