Young Seok Jeon

CV
h-index24
5papers
11citations
Novelty58%
AI Score45

5 Papers

IVJul 29, 2022
FCSN: Global Context Aware Segmentation by Learning the Fourier Coefficients of Objects in Medical Images

Young Seok Jeon, Hongfei Yang, Mengling Feng

The encoder-decoder model is a commonly used Deep Neural Network (DNN) model for medical image segmentation. Conventional encoder-decoder models make pixel-wise predictions focusing heavily on local patterns around the pixel. This makes it challenging to give segmentation that preserves the object's shape and topology, which often requires an understanding of the global context of the object. In this work, we propose a Fourier Coefficient Segmentation Network~(FCSN) -- a novel DNN-based model that segments an object by learning the complex Fourier coefficients of the object's masks. The Fourier coefficients are calculated by integrating over the whole contour. Therefore, for our model to make a precise estimation of the coefficients, the model is motivated to incorporate the global context of the object, leading to a more accurate segmentation of the object's shape. This global context awareness also makes our model robust to unseen local perturbations during inference, such as additive noise or motion blur that are prevalent in medical images. When FCSN is compared with other state-of-the-art models (UNet+, DeepLabV3+, UNETR) on 3 medical image segmentation tasks (ISIC\_2018, RIM\_CUP, RIM\_DISC), FCSN attains significantly lower Hausdorff scores of 19.14 (6\%), 17.42 (6\%), and 9.16 (14\%) on the 3 tasks, respectively. Moreover, FCSN is lightweight by discarding the decoder module, which incurs significant computational overhead. FCSN only requires 22.2M parameters, 82M and 10M fewer parameters than UNETR and DeepLabV3+. FCSN attains inference and training speeds of 1.6ms/img and 6.3ms/img, that is 8$\times$ and 3$\times$ faster than UNet and UNETR.

CVNov 12, 2025
Feature Quality and Adaptability of Medical Foundation Models: A Comparative Evaluation for Radiographic Classification and Segmentation

Frank Li, Theo Dapamede, Mohammadreza Chavoshi et al.

Foundation models (FMs) promise to generalize medical imaging, but their effectiveness varies. It remains unclear how pre-training domain (medical vs. general), paradigm (e.g., text-guided), and architecture influence embedding quality, hindering the selection of optimal encoders for specific radiology tasks. To address this, we evaluate vision encoders from eight medical and general-domain FMs for chest X-ray analysis. We benchmark classification (pneumothorax, cardiomegaly) and segmentation (pneumothorax, cardiac boundary) using linear probing and fine-tuning. Our results show that domain-specific pre-training provides a significant advantage; medical FMs consistently outperformed general-domain models in linear probing, establishing superior initial feature quality. However, feature utility is highly task-dependent. Pre-trained embeddings were strong for global classification and segmenting salient anatomy (e.g., heart). In contrast, for segmenting complex, subtle pathologies (e.g., pneumothorax), all FMs performed poorly without significant fine-tuning, revealing a critical gap in localizing subtle disease. Subgroup analysis showed FMs use confounding shortcuts (e.g., chest tubes for pneumothorax) for classification, a strategy that fails for precise segmentation. We also found that expensive text-image alignment is not a prerequisite; image-only (RAD-DINO) and label-supervised (Ark+) FMs were among top performers. Notably, a supervised, end-to-end baseline remained highly competitive, matching or exceeding the best FMs on segmentation tasks. These findings show that while medical pre-training is beneficial, architectural choices (e.g., multi-scale) are critical, and pre-trained features are not universally effective, especially for complex localization tasks where supervised models remain a strong alternative.

IVJan 18, 2025Code
No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling

Young Seok Jeon, Hongfei Yang, Huazhu Fu et al.

3D models surpass 2D models in CT/MRI segmentation by effectively capturing inter-slice relationships. However, the added depth dimension substantially increases memory consumption. While patch-based training alleviates memory constraints, it significantly slows down the inference speed due to the sliding window (SW) approach. We propose No-More-Sliding-Window (NMSW), a novel end-to-end trainable framework that enhances the efficiency of generic 3D segmentation backbone during an inference step by eliminating the need for SW. NMSW employs a differentiable Top-k module to selectively sample only the most relevant patches, thereby minimizing redundant computations. When patch-level predictions are insufficient, the framework intelligently leverages coarse global predictions to refine results. Evaluated across 3 tasks using 3 segmentation backbones, NMSW achieves competitive accuracy compared to SW inference while significantly reducing computational complexity by 91% (88.0 to 8.00 TMACs). Moreover, it delivers a 9.1x faster inference on the H100 GPU (99.0 to 8.3 sec) and a 11.1x faster inference on the Xeon Gold CPU (2110 to 189 sec). NMSW is model-agnostic, further boosting efficiency when integrated with any existing efficient segmentation backbones. The code is avaialble: https://github.com/Youngseok0001/open_nmsw.

43.8CVMay 9
MultiMedVision: Multi-Modal Medical Vision Framework

Frank Li, Bardia Khosravi, Mohammadreza Chavoshi et al.

Multi-modal medical imaging enables comprehensive diagnostics, yet current foundation models process 2D (e.g. X-ray) and 3D (e.g. CT) data with separate, dimensionality-specific architectures. We present MultiMedVision, a unified framework for joint 2D/3D representation learning built on a Sparse Vision Transformer. Our model uses 3D Rotary Positional Embeddings and variable-length sequence packing to process mixed-modality batches natively within a shared latent space, without modality-specific adapters or treating 3D volumes as 2D slice sequences. Trained with a self-supervised objective on chest X-rays (MIMIC-CXR) and CT scans (CT-RATE), and using a single shared encoder with 5x less data, MultiMedVision achieves competitive performance on both 2D benchmarks (Macro AUROC 0.82 on MIMIC, 0.84 on CheXpert) and 3D tasks (0.85 on CT-RATE). Analysis of the learned representations reveals coexisting modality-specific and shared feature subspaces, demonstrating that unified cross-dimensional representation learning is feasible without sacrificing modality-specific performance.

CVMar 27, 2024
Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior

Young Seok Jeon, Hongfei Yang, Huazhu Fu et al.

Imposing key anatomical features, such as the number of organs, their shapes and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening the effective receptive field (ERF) size with data-intensive modules, or introducing anatomical constraints that scales poorly to multi-organ segmentation. We introduce a novel architecture called the Anatomy-Informed Cascaded Segmentation Network (AIC-Net). AIC-Net incorporates a learnable input termed "Anatomical Prior", which can be adapted to patient-specific anatomy using a differentiable spatial deformation. The deformed prior later guides decoder layers towards more anatomy-informed predictions. We repeat this process at a local patch level to enhance the representation of intricate objects, resulting in a cascaded network structure. AIC-Net is a general method that enhances any existing segmentation models to be more anatomy-aware. We have validated the performance of AIC-Net, with various backbones, on two multi-organ segmentation tasks: abdominal organs and vertebrae. For each respective task, our benchmarks demonstrate improved dice score and Hausdorff distance.