Yue Ren

92.9CVMay 28

CardioLens: Revealing the Clinical Reality Gap of MLLMs via Multi-Sequence Cardiac MRI Evaluations

Zixian Su, Hongkai Zhang, Fan Gao et al.

Multimodal Large Language Models (MLLMs) have shown strong performance on public medical benchmarks, yet existing evaluations often remain weak proxies for clinical use, relying on isolated inputs and simplified recognition-style tasks. We introduce CardioLens, a leakage-resistant evaluation testbed for multi-sequence Cardiovascular Magnetic Resonance (CMR), constructed from private hospital archives through a rigorous report-to-QA construction and verification pipeline. CardioLens contains 473,896 slices and 13,494 verified QA pairs across 4D Cine, LGE, perfusion, and T2-weighted imaging, and evaluates three stages of CMR interpretation: image understanding, report generation, and disease diagnosis. Across 24 state-of-the-art MLLMs, CardioLens reveals a substantial clinical reality gap: models perform poorly overall, with performance degrading along the real CMR workflow. Confusion analysis further shows a category-collapse failure mode, where models default to frequent abnormal categories rather than distinguishing clinically distinct findings. To rule out MLLM-compatible input construction as the primary cause, we compare random, clinically motivated, and data-driven slice selection protocols under different slice budgets; performance changes only marginally, typically by about 1%. Explicit reasoning prompts also fail to rescue performance, often making models more conservative rather than improving visual evidence use. These results show that current MLLMs remain far from reliable CMR interpretation, where clinical decisions require integrating distributed evidence across sequences, views, and temporal phases. CardioLens provides a clinically grounded testbed for developing next-generation MLLMs toward real-world clinical deployment.

COApr 16, 2021

Sharp bounds for the number of regions of maxout networks and vertices of Minkowski sums

Guido Montúfar, Yue Ren, Leon Zhang

We present results on the number of linear regions of the functions that can be represented by artificial feedforward neural networks with maxout units. A rank-k maxout unit is a function computing the maximum of $k$ linear functions. For networks with a single layer of maxout units, the linear regions correspond to the upper vertices of a Minkowski sum of polytopes. We obtain face counting formulas in terms of the intersection posets of tropical hypersurfaces or the number of upper faces of partial Minkowski sums, along with explicit sharp upper bounds for the number of regions for any input dimension, any number of units, and any ranks, in the cases with and without biases. Based on these results we also obtain asymptotically sharp upper bounds for networks with multiple layers.

Yue Ren

2 Papers