Jiarui Yang

h-index5

3papers

23citations

Novelty53%

AI Score48

Ranked #29,295 of 194,257 authors (top 15%)#10,496 in CV (top 18%)

3 Papers

8.7CVDec 21, 2024Code

Diffusion Prior Interpolation for Flexibility Real-World Face Super-Resolution

Jiarui Yang, Tao Dai, Yufei Zhu et al.

Diffusion models represent the state-of-the-art in generative modeling. Due to their high training costs, many works leverage pre-trained diffusion models' powerful representations for downstream tasks, such as face super-resolution (FSR), through fine-tuning or prior-based methods. However, relying solely on priors without supervised training makes it challenging to meet the pixel-level accuracy requirements of discrimination task. Although prior-based methods can achieve high fidelity and high-quality results, ensuring consistency remains a significant challenge. In this paper, we propose a masking strategy with strong and weak constraints and iterative refinement for real-world FSR, termed Diffusion Prior Interpolation (DPI). We introduce conditions and constraints on consistency by masking different sampling stages based on the structural characteristics of the face. Furthermore, we propose a condition Corrector (CRT) to establish a reciprocal posterior sampling process, enhancing FSR performance by mutual refinement of conditions and samples. DPI can balance consistency and diversity and can be seamlessly integrated into pre-trained models. In extensive experiments conducted on synthetic and real datasets, along with consistency validation in face recognition, DPI demonstrates superiority over SOTA FSR methods. The code is available at \url{https://github.com/JerryYann/DPI}.

3.6CVAug 13, 2025Code

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

Wen Huang, Jiarui Yang, Tao Dai et al.

Visual manipulation localization (VML) aims to identify tampered regions in images and videos, a task that has become increasingly challenging with the rise of advanced editing tools. Existing methods face two main issues: resolution diversity, where resizing or padding distorts forensic traces and reduces efficiency, and the modality gap, as images and videos often require separate models. To address these challenges, we propose RelayFormer, a unified framework that adapts to varying resolutions and modalities. RelayFormer partitions inputs into fixed-size sub-images and introduces Global-Local Relay (GLR) tokens, which propagate structured context through a global-local relay attention (GLRA) mechanism. This enables efficient exchange of global cues, such as semantic or temporal consistency, while preserving fine-grained manipulation artifacts. Unlike prior methods that rely on uniform resizing or sparse attention, RelayFormer naturally scales to arbitrary resolutions and video sequences without excessive overhead. Experiments across diverse benchmarks demonstrate that RelayFormer achieves state-of-the-art performance with notable efficiency, combining resolution adaptivity without interpolation or excessive padding, unified modeling for both images and videos, and a strong balance between accuracy and computational cost. Code is available at: https://github.com/WenOOI/RelayFormer.

7.6CVDec 16, 2023

MMBaT: A Multi-task Framework for mmWave-based Human Body Reconstruction and Translation Prediction

Jiarui Yang, Songpengcheng Xia, Yifan Song et al.

Human body reconstruction with Millimeter Wave (mmWave) radar point clouds has gained significant interest due to its ability to work in adverse environments and its capacity to mitigate privacy concerns associated with traditional camera-based solutions. Despite pioneering efforts in this field, two challenges persist. Firstly, raw point clouds contain massive noise points, usually caused by the ambient objects and multi-path effects of Radio Frequency (RF) signals. Recent approaches typically rely on prior knowledge or elaborate preprocessing methods, limiting their applicability. Secondly, even after noise removal, the sparse and inconsistent body-related points pose an obstacle to accurate human body reconstruction. To address these challenges, we introduce mmBaT, a novel multi-task deep learning framework that concurrently estimates the human body and predicts body translations in subsequent frames to extract body-related point clouds. Our method is evaluated on two public datasets that are collected with different radar devices and noise levels. A comprehensive comparison against other state-of-the-art methods demonstrates our method has a superior reconstruction performance and generalization ability from noisy raw data, even when compared to methods provided with body-related point clouds.