Mengyuan Ren

2papers

2 Papers

CVJul 9, 2025
PromptTea: Let Prompts Tell TeaCache the Optimal Threshold

Zishen Huang, Chunyu Yang, Mengyuan Ren

Despite recent progress in video generation, inference speed remains a major bottleneck. A common acceleration strategy involves reusing model outputs via caching mechanisms at fixed intervals. However, we find that such fixed-frequency reuse significantly degrades quality in complex scenes, while manually tuning reuse thresholds is inefficient and lacks robustness. To address this, we propose Prompt-Complexity-Aware (PCA) caching, a method that automatically adjusts reuse thresholds based on scene complexity estimated directly from the input prompt. By incorporating prompt-derived semantic cues, PCA enables more adaptive and informed reuse decisions than conventional caching methods. We also revisit the assumptions behind TeaCache and identify a key limitation: it suffers from poor input-output relationship modeling due to an oversimplified prior. To overcome this, we decouple the noisy input, enhance the contribution of meaningful textual information, and improve the model's predictive accuracy through multivariate polynomial feature expansion. To further reduce computational cost, we replace the static CFGCache with DynCFGCache, a dynamic mechanism that selectively reuses classifier-free guidance (CFG) outputs based on estimated output variations. This allows for more flexible reuse without compromising output quality. Extensive experiments demonstrate that our approach achieves significant acceleration-for example, 2.79x speedup on the Wan2.1 model-while maintaining high visual fidelity across a range of scenes.

CVJul 31, 2019
Cartoon Face Recognition: A Benchmark Dataset

Yi Zheng, Yifan Zhao, Mengyuan Ren et al.

Recent years have witnessed increasing attention in cartoon media, powered by the strong demands of industrial applications. As the first step to understand this media, cartoon face recognition is a crucial but less-explored task with few datasets proposed. In this work, we first present a new challenging benchmark dataset, consisting of 389,678 images of 5,013 cartoon characters annotated with identity, bounding box, pose, and other auxiliary attributes. The dataset, named iCartoonFace, is currently the largest-scale, high-quality, richannotated, and spanning multiple occurrences in the field of image recognition, including near-duplications, occlusions, and appearance changes. In addition, we provide two types of annotations for cartoon media, i.e., face recognition, and face detection, with the help of a semi-automatic labeling algorithm. To further investigate this challenging dataset, we propose a multi-task domain adaptation approach that jointly utilizes the human and cartoon domain knowledge with three discriminative regularizations. We hence perform a benchmark analysis of the proposed dataset and verify the superiority of the proposed approach in the cartoon face recognition task. We believe this public availability will attract more research attention in broad practical application scenarios.