71.4LGMay 28Code
LARK: Learnability-Grounded Trajectory Selection for Efficient Reasoning DistillationTianrun Yu, Kaixiang Zhao, Chih-Chun Chen et al.
We study trajectory selection for reasoning distillation, where teacher-generated reasoning trajectories are selectively used as supervision for a student model. Existing methods rely on heuristics such as trajectory quality or model confidence, but they often overlook whether a trajectory is learnable by the student. In this paper, we present LARK, a learnability-grounded method for reasoning trajectory selection. LARK selects trajectories that the student can learn efficiently while preserving the generalization of the full training distribution. At the core of LARK is a learnability factor $ρ$, which characterizes the rate at which the student's training loss decreases. To estimate this rate efficiently and maintain generalization, we introduce a learnability proxy and a $χ^2$-regularized selection policy that balances learnability and distributional coverage, both with strong theoretical guarantees on their estimation error. Empirically, LARK consistently outperforms data selection baselines across multiple base models and reasoning tasks. Diagnostic analyses show that the LARK score predicts downstream training utility and that LARK-selected trajectories induce faster supervised fine-tuning loss reduction. Our code is available at https://github.com/Tianrun-Yu/LARK.
44.2AIMay 29
TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal GenerationKaixiang Zhao, Tianrun Yu, Shawn Huang et al.
We study fact-level repair for multimodal generation, where a fluent output may contain specific facts that are not supported by the input. Existing inference-time repair methods often generate feedback by jointly conditioning on the input and the current output. This design has two limitations: hallucinated claims in the output can bias the model's interpretation of the input, and free-form feedback cannot be ranked or scheduled at the fact level. We present TIGER, an inference-time framework that redesigns feedback for localized repair. TIGER independently extracts an observation graph from the input and a claim graph from the current output, then assigns each claim a graph-conditioned risk score based on support and conflict. The model repairs selected high-risk claims while keeping the backbone frozen. We provide a convergence analysis showing that the expected total risk decreases geometrically to an explicit asymptotic bound under mild assumptions. Experiments across four cross-modal paths, including image-to-text, image+text-to-text, audio-to-text, and video-to-text, show that TIGER reduces unsupported content while preserving task quality. The gains hold across multiple backbones, and a CrisisFACTS case study suggests that the same repair mechanism can improve grounding in multi-source settings.
6.0CVMay 12Code
FRAME: Forensic Routing and Adaptive Multi-path Evidence Fusion for Image Manipulation DetectionKaixiang Zhao, Tianrun Yu, Aoxu Zhang et al.
The proliferation of sophisticated image editing tools and generative artificial intelligence models has made verifying the authenticity of digital images increasingly challenging, with important implications for journalism, forensic analysis, and public trust. Although numerous forensic algorithms, ranging from handcrafted methods to deep learning-based detectors, have been developed for manipulation detection, individual methods often suffer from limited robustness, fragmented evidence, or weak generalization across manipulation types and image conditions. To address these limitations, we present \textbf{FRAME}, a method for \textbf{F}orensic \textbf{R}outing and \textbf{A}daptive \textbf{M}ulti-path \textbf{E}vidence fusion for image manipulation detection. FRAME organizes diverse forensic algorithms into a multi-path analysis space, adaptively selects informative forensic paths for each input image, and fuses complementary evidence to improve detection and localization performance. By moving beyond single-method analysis and fixed fusion strategies, FRAME provides a more robust and flexible approach to image forensic reasoning while preserving interpretable forensic cues from multiple evidence sources. Experimental results demonstrate the effectiveness of FRAME across diverse manipulation scenarios. Code is available at \href{https://github.com/kzhao5/FRAME}{https://github.com/kzhao5/FRAME}.