33.2LGMay 14Code
TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM RoutingPei Yang, Wanyi Chen, Tongyun Yang et al.
LLM routing matters most in long-horizon applications such as coding agents, deep research systems, and computer-use agents, where a single user request triggers many model calls. Routing each call to the cheapest sufficient model can cut costs without sacrificing quality, yet existing router benchmarks evaluate routers only on one-shot prompts. They never expose the router-visible prefix at an intermediate agent step, never test whether a cheaper replacement preserves downstream task success, and often rely on online LLM judges at evaluation time. We introduce TwinRouterBench, a step-level routing benchmark with two tracks. The static track provides 970 router-visible prefixes from 520 instances across SWE-bench, BFCL, mtRAG, QMSum, and PinchBench, each paired with an execution-verified target tier estimated under a released downgrade-and-cascade protocol; scoring is deterministic arithmetic over tier labels, trajectory membership, and token costs, with no online evaluator-side LLM judge. The dynamic track supplies a harness that runs routers on the full 500-case SWE-bench Verified suite; in this paper we report a 100-case held-out evaluation disjoint from the static SWE supervision split. At each LLM call the router selects a concrete model from a locked pool, and success is measured by official task resolution and realized API spend. The two tracks support fast offline iteration followed by end-to-end validation under live agent execution. Code and data are available at https://github.com/CommonstackAI/TwinRouterBench.
CVAug 28, 2025
Reverse Imaging for Wide-spectrum Generalization of Cardiac MRI SegmentationYidong Zhao, Peter Kellman, Hui Xue et al.
Pretrained segmentation models for cardiac magnetic resonance imaging (MRI) struggle to generalize across different imaging sequences due to significant variations in image contrast. These variations arise from changes in imaging protocols, yet the same fundamental spin properties, including proton density, T1, and T2 values, govern all acquired images. With this core principle, we introduce Reverse Imaging, a novel physics-driven method for cardiac MRI data augmentation and domain adaptation to fundamentally solve the generalization problem. Our method reversely infers the underlying spin properties from observed cardiac MRI images, by solving ill-posed nonlinear inverse problems regularized by the prior distribution of spin properties. We acquire this "spin prior" by learning a generative diffusion model from the multiparametric SAturation-recovery single-SHot acquisition sequence (mSASHA) dataset, which offers joint cardiac T1 and T2 maps. Our method enables approximate but meaningful spin-property estimates from MR images, which provide an interpretable "latent variable" that lead to highly flexible image synthesis of arbitrary novel sequences. We show that Reverse Imaging enables highly accurate segmentation across vastly different image contrasts and imaging protocols, realizing wide-spectrum generalization of cardiac MRI segmentation.