Namjoon Kim

76.1CVMar 28

MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence

Woohyeon Park, Jaeik Kim, Sunghwan Steve Cho et al.

Lesion detection, symptom tracking, and visual explainability are central to real-world medical image analysis, yet current medical Vision-Language Models (VLMs) still lack mechanisms that translate their broad knowledge into clinically actionable outputs. To bridge this gap, we present MEDIC-AD, a clinically oriented VLM that strengthens these three capabilities through a stage-wise framework. First, learnable anomaly-aware tokens (<Ano>) encourage the model to focus on abnormal regions and build more discriminative lesion centered representations. Second, inter image difference tokens (<Diff>) explicitly encode temporal changes between studies, allowing the model to distinguish worsening, improvement, and stability in disease burden. Finally, a dedicated explainability stage trains the model to generate heatmaps that highlight lesion-related regions, offering clear visual evidence that is consistent with the model's reasoning. Through our staged design, MEDIC-AD steadily boosts performance across anomaly detection, symptom tracking, and anomaly segmentation, achieving state-of-the-art results compared with both closed source and medical-specialized baselines. Evaluations on real longitudinal clinical data collected from real hospital workflows further show that MEDIC-AD delivers stable predictions and clinically faithful explanations in practical patient-monitoring and decision-support workflows

LGMar 5

LUMINA: Laplacian-Unifying Mechanism for Interpretable Neurodevelopmental Analysis via Quad-Stream GCN

Minkyung Cha, Jooyoung Bae, Jaewon Jung et al.

Functional Magnetic Resonance Imaging(fMRI) has now become a classic way for measuring brain activity, and recent trend is shifting toward utilizing fMRI brain data for AI-driven diagnosis. Given that the brain functions as not a discrete but interconnected whole, Graph-based architectures represented by Graph Convolutional Network(GCN) has emerged as a dominant framework for such task, since they are capable of treating ROIs as dynamically interconnected nodes and extracting relational architecture between them. Ironically, however, it is the very nature of GCN's architecture that acts as an obstacle to its performance. The mathematical foundation of GCN, effective for capturing global regularities, acts as a tradeoff; by smoothing features across the connected nodes repeatedly, traditional GCN tend to blur out the contrastive dynamics that might be crucial in identifying certain neurological disorders. In order to break through this structural bottleneck, we propose LUMINA, a Laplacian-Unifying Mechanism for Interpretable Neurodevelopmental Analysis. Our model is a Quad-Stream GCN that employs a bipolar RELU activation and a dual-spectrum graph Laplacian filtering mechanism, thereby capturing heterogeneous dynamics that were often blurred out in conventional GCN. By doing so, we can preserve the diverse range and characteristics of neural connections in each fMRI data. Through 5-fold cross validation on the ADHD200(N=144) and ABIDE(N=579) dataset, LUMINA demonstrates stable diagnostic performance in two of the most critical neurodevelopmental disorder in childhood, ADHD and ASD, outperforming existing models with an accuracy of 84.66% and 88.41% each.

Namjoon Kim

2 Papers