Yahui Wang

CV
h-index11
10papers
163citations
Novelty34%
AI Score51

10 Papers

58.1CVApr 12
NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Models: Datasets, Methods and Results

Xin Li, Jiachao Gong, Xijun Wang et al.

This paper presents an overview of the NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Models. This challenge utilizes a new short-form UGC (S-UGC) video restoration benchmark, termed KwaiVIR, which is contributed by USTC and Kuaishou Technology. It contains both synthetically distorted videos and real-world short-form UGC videos in the wild. For this edition, the released data include 200 synthetic training videos, 48 wild training videos, 11 validation videos, and 20 testing videos. The primary goal of this challenge is to establish a strong and practical benchmark for restoring short-form UGC videos under complex real-world degradations, especially in the emerging paradigm of generative-model-based S-UGC video restoration. This challenge has two tracks: (i) the primary track is a subjective track, where the evaluation is based on a user study; (ii) the second track is an objective track. These two tracks enable a comprehensive assessment of restoration quality. In total, 95 teams have registered for this competition. And 12 teams submitted valid final solutions and fact sheets for the testing phase. The submitted methods achieved strong performance on the KwaiVIR benchmark, demonstrating encouraging progress in short-form UGC video restoration in the wild.

56.4CVApr 8
NTIRE 2026 Challenge on Bitstream-Corrupted Video Restoration: Methods and Results

Wenbin Zou, Tianyi Li, Kejun Wu et al.

This paper reports on the NTIRE 2026 Challenge on Bitstream-Corrupted Video Restoration (BSCVR). The challenge aims to advance research on recovering visually coherent videos from corrupted bitstreams, whose decoding often produces severe spatial-temporal artifacts and content distortion. Built upon recent progress in bitstream-corrupted video recovery, the challenge provides a common benchmark for evaluating restoration methods under realistic corruption settings. We describe the dataset, evaluation protocol, and participating methods, and summarize the final results and main technical trends. The challenge highlights the difficulty of this emerging task and provides useful insights for future research on robust video restoration under practical bitstream corruption.

SIMar 28, 2023Code
Cost Sensitive GNN-based Imbalanced Learning for Mobile Social Network Fraud Detection

Xinxin Hu, Haotian Chen, Hongchang Chen et al.

With the rapid development of mobile networks, the people's social contacts have been considerably facilitated. However, the rise of mobile social network fraud upon those networks, has caused a great deal of distress, in case of depleting personal and social wealth, then potentially doing significant economic harm. To detect fraudulent users, call detail record (CDR) data, which portrays the social behavior of users in mobile networks, has been widely utilized. But the imbalance problem in the aforementioned data, which could severely hinder the effectiveness of fraud detectors based on graph neural networks(GNN), has hardly been addressed in previous work. In this paper, we are going to present a novel Cost-Sensitive Graph Neural Network (CSGNN) by creatively combining cost-sensitive learning and graph neural networks. We conduct extensive experiments on two open-source realworld mobile network fraud datasets. The results show that CSGNN can effectively solve the graph imbalance problem and then achieve better detection performance than the state-of-the-art algorithms. We believe that our research can be applied to solve the graph imbalance problems in other fields. The CSGNN code and datasets are publicly available at https://github.com/xxhu94/CSGNN.

LGMar 29, 2023Code
GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection

Xinxin Hu, Haotian Chen, Junjie Zhang et al.

Along with the rapid evolution of mobile communication technologies, such as 5G, there has been a drastically increase in telecom fraud, which significantly dissipates individual fortune and social wealth. In recent years, graph mining techniques are gradually becoming a mainstream solution for detecting telecom fraud. However, the graph imbalance problem, caused by the Pareto principle, brings severe challenges to graph data mining. This is a new and challenging problem, but little previous work has been noticed. In this paper, we propose a Graph ATtention network with COst-sensitive BOosting (GAT-COBO) for the graph imbalance problem. First, we design a GAT-based base classifier to learn the embeddings of all nodes in the graph. Then, we feed the embeddings into a well-designed cost-sensitive learner for imbalanced learning. Next, we update the weights according to the misclassification cost to make the model focus more on the minority class. Finally, we sum the node embeddings obtained by multiple cost-sensitive learners to obtain a comprehensive node representation, which is used for the downstream anomaly detection task. Extensive experiments on two real-world telecom fraud detection datasets demonstrate that our proposed method is effective for the graph imbalance problem, outperforming the state-of-the-art GNNs and GNN-based fraud detectors. In addition, our model is also helpful for solving the widespread over-smoothing problem in GNNs. The GAT-COBO code and datasets are available at https://github.com/xxhu94/GAT-COBO.

29.4CVApr 14
LoViF 2026 The First Challenge on Weather Removal in Videos

Chenghao Qian, Xin Li, Yeying Jin et al.

This paper presents a review of the LoViF 2026 Challenge on Weather Removal in Videos. The challenge encourages the development of methods for restoring clean videos from inputs degraded by adverse weather conditions such as rain and snow, with an emphasis on achieving visually plausible and temporally consistent results while preserving scene structure and motion dynamics. To support this task, we introduce a new short-form WRV dataset tailored for video weather removal. It consists of 18 videos 1,216 synthesized frames paired with 1,216 real-world ground-truth frames at a resolution of 832 x 480, and is split into training, validation, and test sets with a ratio of 1:1:1. The goal of this challenge is to advance robust and realistic video restoration under real-world weather conditions, with evaluation protocols that jointly consider fidelity and perceptual quality. The challenge attracted 37 participants and received 5 valid final submissions with corresponding fact sheets, contributing to progress in weather removal for videos. The project is publicly available at https://www.codabench.org/competitions/13462/.

NAJan 7, 2019
A High-Order Modified Finite Volume WENO Method on 3D Cartesian Grids

Yulong Du, Li Yuan*, Yahui Wang

The modified dimension-by-dimension finite volume (FV) WENO method on Cartesian grids proposed by Buchmüller and Helzel can retain the full order of accuracy of the one-dimensional WENO reconstruction and requires only one flux computation per interface. The high-order accurate conversion between face-averaged values and face-center point values is the main ingredient of this method. In this paper, we derive sixth-order accurate conversion formulas on three-dimensional Cartesian grids. It is shown that the resulting modified FV WENO method is efficient and high-order accurate when applied to smooth nonlinear multidimensional problems, and is robust for calculating non-smooth nonlinear problems with strong shocks..

LGDec 29, 2025
Spectral Analysis of Hard-Constraint PINNs: The Spatial Modulation Mechanism of Boundary Functions

Yuchen Xie, Honghang Chi, Haopeng Quan et al.

Physics-Informed Neural Networks with hard constraints (HC-PINNs) are increasingly favored for their ability to strictly enforce boundary conditions via a trial function ansatz $\tilde{u} = A + B \cdot N$, yet the theoretical mechanisms governing their training dynamics have remained unexplored. Unlike soft-constrained formulations where boundary terms act as additive penalties, this work reveals that the boundary function $B$ introduces a multiplicative spatial modulation that fundamentally alters the learning landscape. A rigorous Neural Tangent Kernel (NTK) framework for HC-PINNs is established, deriving the explicit kernel composition law. This relationship demonstrates that the boundary function $B(\vec{x})$ functions as a spectral filter, reshaping the eigenspectrum of the neural network's native kernel. Through spectral analysis, the effective rank of the residual kernel is identified as a deterministic predictor of training convergence, superior to classical condition numbers. It is shown that widely used boundary functions can inadvertently induce spectral collapse, leading to optimization stagnation despite exact boundary satisfaction. Validated across multi-dimensional benchmarks, this framework transforms the design of boundary functions from a heuristic choice into a principled spectral optimization problem, providing a solid theoretical foundation for geometric hard constraints in scientific machine learning.

63.3CVApr 21
LoViF 2026 Challenge on Real-World All-in-One Image Restoration: Methods and Results

Xiang Chen, Hao Li, Jiangxin Dong et al.

This paper presents a review for the LoViF Challenge on Real-World All-in-One Image Restoration. The challenge aimed to advance research on real-world all-in-one image restoration under diverse real-world degradation conditions, including blur, low-light, haze, rain, and snow. It provided a unified benchmark to evaluate the robustness and generalization ability of restoration models across multiple degradation categories within a common framework. The competition attracted 124 registered participants and received 9 valid final submissions with corresponding fact sheets, significantly contributing to the progress of real-world all-in-one image restoration. This report provides a detailed analysis of the submitted methods and corresponding results, emphasizing recent progress in unified real-world image restoration. The analysis highlights effective approaches and establishes a benchmark for future research in real-world low-level vision.

CVAug 21, 2025
MedRepBench: A Comprehensive Benchmark for Medical Report Interpretation

Fangxin Shang, Yuan Xia, Dalu Yang et al.

Medical report interpretation plays a crucial role in healthcare, enabling both patient-facing explanations and effective information flow across clinical systems. While recent vision-language models (VLMs) and large language models (LLMs) have demonstrated general document understanding capabilities, there remains a lack of standardized benchmarks to assess structured interpretation quality in medical reports. We introduce MedRepBench, a comprehensive benchmark built from 1,900 de-identified real-world Chinese medical reports spanning diverse departments, patient demographics, and acquisition formats. The benchmark is designed primarily to evaluate end-to-end VLMs for structured medical report understanding. To enable controlled comparisons, we also include a text-only evaluation setting using high-quality OCR outputs combined with LLMs, allowing us to estimate the upper-bound performance when character recognition errors are minimized. Our evaluation framework supports two complementary protocols: (1) an objective evaluation measuring field-level recall of structured clinical items, and (2) an automated subjective evaluation using a powerful LLM as a scoring agent to assess factuality, interpretability, and reasoning quality. Based on the objective metric, we further design a reward function and apply Group Relative Policy Optimization (GRPO) to improve a mid-scale VLM, achieving up to 6% recall gain. We also observe that the OCR+LLM pipeline, despite strong performance, suffers from layout-blindness and latency issues, motivating further progress toward robust, fully vision-based report understanding.

RONov 30, 2018
CubemapSLAM: A Piecewise-Pinhole Monocular Fisheye SLAM System

Yahui Wang, Shaojun Cai, Shi-Jie Li et al.

We present a real-time feature-based SLAM (Simultaneous Localization and Mapping) system for fisheye cameras featured by a large field-of-view (FoV). Large FoV cameras are beneficial for large-scale outdoor SLAM applications, because they increase visual overlap between consecutive frames and capture more pixels belonging to the static parts of the environment. However, current feature-based SLAM systems such as PTAM and ORB-SLAM limit their camera model to pinhole only. To compensate for the vacancy, we propose a novel SLAM system with the cubemap model that utilizes the full FoV without introducing distortion from the fisheye lens, which greatly benefits the feature matching pipeline. In the initialization and point triangulation stages, we adopt a unified vector-based representation to efficiently handle matches across multiple faces, and based on this representation we propose and analyze a novel inlier checking metric. In the optimization stage, we design and test a novel multi-pinhole reprojection error metric that outperforms other metrics by a large margin. We evaluate our system comprehensively on a public dataset as well as a self-collected dataset that contains real-world challenging sequences. The results suggest that our system is more robust and accurate than other feature-based fisheye SLAM approaches. The CubemapSLAM system has been released into the public domain.