Junru Lin

h-index30
2papers

2 Papers

CVSep 26, 2025
VideoScore2: Think before You Score in Generative Video Evaluation

Xuan He, Dongfu Jiang, Ping Nie et al. · utoronto

Recent advances in text-to-video generation have produced increasingly realistic and diverse content, yet evaluating such videos remains a fundamental challenge due to their multi-faceted nature encompassing visual quality, semantic alignment, and physical consistency. Existing evaluators and reward models are limited to single opaque scores, lack interpretability, or provide only coarse analysis, making them insufficient for capturing the comprehensive nature of video quality assessment. We present VideoScore2, a multi-dimensional, interpretable, and human-aligned framework that explicitly evaluates visual quality, text-to-video alignment, and physical/common-sense consistency while producing detailed chain-of-thought rationales. Our model is trained on a large-scale dataset VideoFeedback2 containing 27,168 human-annotated videos with both scores and reasoning traces across three dimensions, using a two-stage pipeline of supervised fine-tuning followed by reinforcement learning with Group Relative Policy Optimization (GRPO) to enhance analytical robustness. Extensive experiments demonstrate that VideoScore2 achieves superior performance with 44.35 (+5.94) accuracy on our in-domain benchmark VideoScore-Bench-v2 and 50.37 (+4.32) average performance across four out-of-domain benchmarks (VideoGenReward-Bench, VideoPhy2, etc), while providing interpretable assessments that bridge the gap between evaluation and controllable generation through effective reward modeling for Best-of-N sampling. Project Page: https://tiger-ai-lab.github.io/VideoScore2/

CVDec 20, 2023
Ternary-Type Opacity and Hybrid Odometry for RGB NeRF-SLAM

Junru Lin, Asen Nachkov, Songyou Peng et al.

In this work, we address the challenge of deploying Neural Radiance Field (NeRFs) in Simultaneous Localization and Mapping (SLAM) under the condition of lacking depth information, relying solely on RGB inputs. The key to unlocking the full potential of NeRF in such a challenging context lies in the integration of real-world priors. A crucial prior we explore is the binary opacity prior of 3D space with opaque objects. To effectively incorporate this prior into the NeRF framework, we introduce a ternary-type opacity (TT) model instead, which categorizes points on a ray intersecting a surface into three regions: before, on, and behind the surface. This enables a more accurate rendering of depth, subsequently improving the performance of image warping techniques. Therefore, we further propose a novel hybrid odometry (HO) scheme that merges bundle adjustment and warping-based localization. Our integrated approach of TT and HO achieves state-of-the-art performance on synthetic and real-world datasets, in terms of both speed and accuracy. This breakthrough underscores the potential of NeRF-SLAM in navigating complex environments with high fidelity.