Chaoyang Li

CV
h-index17
4papers
3citations
Novelty57%
AI Score52

4 Papers

CVFeb 23
Decoupling Defense Strategies for Robust Image Watermarking

Jiahui Chen, Zehang Deng, Zeyu Zhang et al.

Deep learning-based image watermarking, while robust against conventional distortions, remains vulnerable to advanced adversarial and regeneration attacks. Conventional countermeasures, which jointly optimize the encoder and decoder via a noise layer, face 2 inevitable challenges: (1) decrease of clean accuracy due to decoder adversarial training and (2) limited robustness due to simultaneous training of all three advanced attacks. To overcome these issues, we propose AdvMark, a novel two-stage fine-tuning framework that decouples the defense strategies. In stage 1, we address adversarial vulnerability via a tailored adversarial training paradigm that primarily fine-tunes the encoder while only conditionally updating the decoder. This approach learns to move the image into a non-attackable region, rather than modifying the decision boundary, thus preserving clean accuracy. In stage 2, we tackle distortion and regeneration attacks via direct image optimization. To preserve the adversarial robustness gained in stage 1, we formulate a principled, constrained image loss with theoretical guarantees, which balances the deviation from cover and previous encoded images. We also propose a quality-aware early-stop to further guarantee the lower bound of visual quality. Extensive experiments demonstrate AdvMark outperforms with the highest image quality and comprehensive robustness, i.e. up to 29\%, 33\% and 46\% accuracy improvement for distortion, regeneration and adversarial attacks, respectively.

MMAug 22, 2025Code
Beyond Interpretability: Exploring the Comprehensibility of Adaptive Video Streaming through Large Language Models

Lianchen Jia, Chaoyang Li, Ziqi Yuan et al.

Over the past decade, adaptive video streaming technology has witnessed significant advancements, particularly driven by the rapid evolution of deep learning techniques. However, the black-box nature of deep learning algorithms presents challenges for developers in understanding decision-making processes and optimizing for specific application scenarios. Although existing research has enhanced algorithm interpretability through decision tree conversion, interpretability does not directly equate to developers' subjective comprehensibility. To address this challenge, we introduce \texttt{ComTree}, the first bitrate adaptation algorithm generation framework that considers comprehensibility. The framework initially generates the complete set of decision trees that meet performance requirements, then leverages large language models to evaluate these trees for developer comprehensibility, ultimately selecting solutions that best facilitate human understanding and enhancement. Experimental results demonstrate that \texttt{ComTree} significantly improves comprehensibility while maintaining competitive performance, showing potential for further advancement. The source code is available at https://github.com/thu-media/ComTree.

AIOct 21, 2025Code
Crucible: Quantifying the Potential of Control Algorithms through LLM Agents

Lianchen Jia, Chaoyang Li, Qian Houde et al.

Control algorithms in production environments typically require domain experts to tune their parameters and logic for specific scenarios. However, existing research predominantly focuses on algorithmic performance under ideal or default configurations, overlooking the critical aspect of Tuning Potential. To bridge this gap, we introduce Crucible, an agent that employs an LLM-driven, multi-level expert simulation to turn algorithms and defines a formalized metric to quantitatively evaluate their Tuning Potential. We demonstrate Crucible's effectiveness across a wide spectrum of case studies, from classic control tasks to complex computer systems, and validate its findings in a real-world deployment. Our experimental results reveal that Crucible systematically quantifies the tunable space across different algorithms. Furthermore, Crucible provides a new dimension for algorithm analysis and design, which ultimately leads to performance improvements. Our code is available at https://github.com/thu-media/Crucible.

CVApr 13
Test-time Scaling over Perception: Resolving the Grounding Paradox in Thinking with Images

Zheng Jiang, Yiming Chen, Nan He et al.

Recent multimodal large language models (MLLMs) have begun to support Thinking with Images by invoking visual tools such as zooming and cropping during inference. Yet these systems remain brittle in fine-grained visual reasoning because they must decide where to look before they have access to the evidence needed to make that decision correctly. We identify this circular dependency as the Grounding Paradox. To address it, we propose Test-Time Scaling over Perception (TTSP), a framework that treats perception itself as a scalable inference process. TTSP generates multiple exploratory perception traces, filters unreliable traces using entropy-based confidence estimation, distills validated observations into structured knowledge, and iteratively refines subsequent exploration toward unresolved uncertainty. Extensive experiments on high-resolution and general multimodal reasoning benchmarks show that TTSP consistently outperforms strong baselines across backbone sizes, while also exhibiting favorable scalability and token efficiency. Our results suggest that scaling perception at test time is a promising direction for robust multimodal reasoning under perceptual uncertainty.