IVMar 24, 2025Code
Rethinking Glaucoma Calibration: Voting-Based Binocular and Metadata IntegrationTaejin Jeong, Joohyeok Kim, Jaehoon Joo et al.
Glaucoma is a major cause of irreversible blindness, with significant diagnostic subjectivity. This inherent uncertainty, combined with the overconfidence of models optimized solely for accuracy can lead to fatal issues such as overdiagnosis or missing critical diseases. To ensure clinical trust, model calibration is essential for reliable predictions, yet study in this field remains limited. Existing calibration study have overlooked glaucoma's systemic associations and high diagnostic subjectivity. To overcome these limitations, we propose V-ViT (Voting-based ViT), a framework that enhances calibration by integrating a patient's binocular information and metadata. Furthermore, to mitigate diagnostic subjectivity, V-ViT utilizes an iterative dropout-based Voting System to maximize calibration performance. The proposed framework achieved state-of-the-art performance across all metrics, including the primary calibration metrics. Our results demonstrate that V-ViT effectively resolves the issue of overconfidence in predictions in glaucoma diagnosis, providing highly reliable predictions for clinical use. Our source code is available at https://github.com/starforTJ/V-ViT.
CVSep 18, 2024
Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal GuidanceJaehoon Joo, Taejin Jeong, Seongjae Hwang
Understanding how humans process visual information is one of the crucial steps for unraveling the underlying mechanism of brain activity. Recently, this curiosity has motivated the fMRI-to-image reconstruction task; given the fMRI data from visual stimuli, it aims to reconstruct the corresponding visual stimuli. Surprisingly, leveraging powerful generative models such as the Latent Diffusion Model (LDM) has shown promising results in reconstructing complex visual stimuli such as high-resolution natural images from vision datasets. Despite the impressive structural fidelity of these reconstructions, they often lack details of small objects, ambiguous shapes, and semantic nuances. Consequently, the incorporation of additional semantic knowledge, beyond mere visuals, becomes imperative. In light of this, we exploit how modern LDMs effectively incorporate multi-modal guidance (text guidance, visual guidance, and image layout) for structurally and semantically plausible image generations. Specifically, inspired by the two-streams hypothesis suggesting that perceptual and semantic information are processed in different brain regions, our framework, Brain-Streams, maps fMRI signals from these brain regions to appropriate embeddings. That is, by extracting textual guidance from semantic information regions and visual guidance from perceptual information regions, Brain-Streams provides accurate multi-modal guidance to LDMs. We validate the reconstruction ability of Brain-Streams both quantitatively and qualitatively on a real fMRI dataset comprising natural image stimuli and fMRI data.