Robin Chen

CV
h-index1
3papers
10citations
Novelty50%
AI Score41

3 Papers

LGSep 18, 2025Code
Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning

Chi Liu, Derek Li, Yan Shu et al.

While large language models show promise in medical applications, achieving expert-level clinical reasoning remains challenging due to the need for both accurate answers and transparent reasoning processes. To address this challenge, we introduce Fleming-R1, a model designed for verifiable medical reasoning through three complementary innovations. First, our Reasoning-Oriented Data Strategy (RODS) combines curated medical QA datasets with knowledge-graph-guided synthesis to improve coverage of underrepresented diseases, drugs, and multi-hop reasoning chains. Second, we employ Chain-of-Thought (CoT) cold start to distill high-quality reasoning trajectories from teacher models, establishing robust inference priors. Third, we implement a two-stage Reinforcement Learning from Verifiable Rewards (RLVR) framework using Group Relative Policy Optimization, which consolidates core reasoning skills while targeting persistent failure modes through adaptive hard-sample mining. Across diverse medical benchmarks, Fleming-R1 delivers substantial parameter-efficient improvements: the 7B variant surpasses much larger baselines, while the 32B model achieves near-parity with GPT-4o and consistently outperforms strong open-source alternatives. These results demonstrate that structured data design, reasoning-oriented initialization, and verifiable reinforcement learning can advance clinical reasoning beyond simple accuracy optimization. We release Fleming-R1 publicly to promote transparent, reproducible, and auditable progress in medical AI, enabling safer deployment in high-stakes clinical environments.

CVNov 2, 2025
Fleming-VL: Towards Universal Medical Visual Reasoning with Multimodal LLMs

Yan Shu, Chi Liu, Robin Chen et al.

Multimodal Large Language Models (MLLMs) have demonstrated remarkable effectiveness in various general-domain scenarios, such as visual question answering and image captioning. Recently, researchers have increasingly focused on empowering MLLMs with medical conversational abilities, which hold significant promise for clinical applications. However, medical data presents unique challenges due to its heterogeneous nature -- encompassing diverse modalities including 2D images, 3D volumetric scans, and temporal video sequences. The substantial domain gap and data format inconsistencies across these modalities have hindered the development of unified medical MLLMs. To address these challenges, we propose Fleming-VL, a unified end-to-end framework for comprehensive medical visual understanding across heterogeneous modalities. Fleming-VL tackles this problem from a data-centric perspective through three key strategies: (1) scaling up pretraining by integrating long-context data from both natural and medical-specific domains; (2) complementing fine-tuning with rare medical data, including holistic video analysis and underrepresented 2D modalities such as ultrasound and dermoscopy images; (3) extending existing evaluation frameworks to incorporate 3D volumetric and video understanding benchmarks. Through supervised fine-tuning (SFT) and group relative policy optimization (GRPO), we develop Fleming-VL in multiple model scales. Extensive experiments demonstrate that Fleming-VL achieves state-of-the-art performance across multiple benchmarks, including medical VQA, video QA, and 3D medical image understanding. We publicly release Fleming-VL to promote transparent, reproducible, and auditable progress in medical AI.

CVMay 26, 2023
Higher Order Gauge Equivariant CNNs on Riemannian Manifolds and Applications

Gianfranco Cortes, Yue Yu, Robin Chen et al.

With the advent of group equivariant convolutions in deep networks literature, spherical CNNs with $\mathsf{SO}(3)$-equivariant layers have been developed to cope with data that are samples of signals on the sphere $S^2$. One can implicitly obtain $\mathsf{SO}(3)$-equivariant convolutions on $S^2$ with significant efficiency gains by explicitly requiring gauge equivariance w.r.t. $\mathsf{SO}(2)$. In this paper, we build on this fact by introducing a higher order generalization of the gauge equivariant convolution, whose implementation is dubbed a gauge equivariant Volterra network (GEVNet). This allows us to model spatially extended nonlinear interactions within a given receptive field while still maintaining equivariance to global isometries. We prove theoretical results regarding the equivariance and construction of higher order gauge equivariant convolutions. Then, we empirically demonstrate the parameter efficiency of our model, first on computer vision benchmark data (e.g. spherical MNIST), and then in combination with a convolutional kernel network (CKN) on neuroimaging data. In the neuroimaging data experiments, the resulting two-part architecture (CKN + GEVNet) is used to automatically discriminate between patients with Lewy Body Disease (DLB), Alzheimer's Disease (AD) and Parkinson's Disease (PD) from diffusion magnetic resonance images (dMRI). The GEVNet extracts micro-architectural features within each voxel, while the CKN extracts macro-architectural features across voxels. This compound architecture is uniquely poised to exploit the intra- and inter-voxel information contained in the dMRI data, leading to improved performance over the classification results obtained from either of the individual components.