Xiaoyu Bai

11papers

103citations

Novelty51%

AI Score29

Ranked #152,686 of 201,326 authors (top 76%)#47,401 in CV (top 80%)

11 Papers

CVNov 25, 2023Code

SAME++: A Self-supervised Anatomical eMbeddings Enhanced medical image registration framework using stable sampling and regularized transformation

Lin Tian, Zi Li, Fengze Liu et al. · harvard

Image registration is a fundamental medical image analysis task. Ideally, registration should focus on aligning semantically corresponding voxels, i.e., the same anatomical locations. However, existing methods often optimize similarity measures computed directly on intensities or on hand-crafted features, which lack anatomical semantic information. These similarity measures may lead to sub-optimal solutions where large deformations, complex anatomical differences, or cross-modality imagery exist. In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration building on top of a Self-supervised Anatomical eMbedding (SAM) algorithm, which is capable of computing dense anatomical correspondences between two images at the voxel level. We name our approach SAM-Enhanced registration (SAME++), which decomposes image registration into four steps: affine transformation, coarse deformation, deep non-parametric transformation, and instance optimization. Using SAM embeddings, we enhance these steps by finding more coherent correspondence and providing features with better semantic guidance. We extensively evaluated SAME++ using more than 50 labeled organs on three challenging inter-subject registration tasks of different body parts. As a complete registration framework, SAME++ markedly outperforms leading methods by $4.2\%$ - $8.2\%$ in terms of Dice score while being orders of magnitude faster than numerical optimization-based methods. Code is available at \url{https://github.com/alibaba-damo-academy/same}.

CVJul 19, 2023

SAMConvex: Fast Discrete Optimization for CT Registration using Self-supervised Anatomical Embedding and Correlation Pyramid

Zi Li, Lin Tian, Tony C. W. Mok et al. · harvard

Estimating displacement vector field via a cost volume computed in the feature space has shown great success in image registration, but it suffers excessive computation burdens. Moreover, existing feature descriptors only extract local features incapable of representing the global semantic information, which is especially important for solving large transformations. To address the discussed issues, we propose SAMConvex, a fast coarse-to-fine discrete optimization method for CT registration that includes a decoupled convex optimization procedure to obtain deformation fields based on a self-supervised anatomical embedding (SAM) feature extractor that captures both local and global information. To be specific, SAMConvex extracts per-voxel features and builds 6D correlation volumes based on SAM features, and iteratively updates a flow field by performing lookups on the correlation volumes with a coarse-to-fine scheme. SAMConvex outperforms the state-of-the-art learning-based methods and optimization-based methods over two inter-patient registration datasets (Abdomen CT and HeadNeck CT) and one intra-patient registration dataset (Lung CT). Moreover, as an optimization-based method, SAMConvex only takes $\sim2$s ($\sim5s$ with instance optimization) for one paired images.

CVFeb 11, 2023Code

Anatomical Invariance Modeling and Semantic Alignment for Self-supervised Learning in 3D Medical Image Analysis

Yankai Jiang, Mingze Sun, Heng Guo et al.

Self-supervised learning (SSL) has recently achieved promising performance for 3D medical image analysis tasks. Most current methods follow existing SSL paradigm originally designed for photographic or natural images, which cannot explicitly and thoroughly exploit the intrinsic similar anatomical structures across varying medical images. This may in fact degrade the quality of learned deep representations by maximizing the similarity among features containing spatial misalignment information and different anatomical semantics. In this work, we propose a new self-supervised learning framework, namely Alice, that explicitly fulfills Anatomical invariance modeling and semantic alignment via elaborately combining discriminative and generative objectives. Alice introduces a new contrastive learning strategy which encourages the similarity between views that are diversely mined but with consistent high-level semantics, in order to learn invariant anatomical features. Moreover, we design a conditional anatomical feature alignment module to complement corrupted embeddings with globally matched semantics and inter-patch topology information, conditioned by the distribution of local image content, which permits to create better contrastive pairs. Our extensive quantitative experiments on three 3D medical image analysis tasks demonstrate and validate the performance superiority of Alice, surpassing the previous best SSL counterpart methods and showing promising ability for united representation learning. Codes are available at https://github.com/alibaba-damo-academy/alice.

IVJul 17, 2023

Liver Tumor Screening and Diagnosis in CT with Pixel-Lesion-Patient Network

Ke Yan, Xiaoli Yin, Yingda Xia et al.

Liver tumor segmentation and classification are important tasks in computer aided diagnosis. We aim to address three problems: liver tumor screening and preliminary diagnosis in non-contrast computed tomography (CT), and differential diagnosis in dynamic contrast-enhanced CT. A novel framework named Pixel-Lesion-pAtient Network (PLAN) is proposed. It uses a mask transformer to jointly segment and classify each lesion with improved anchor queries and a foreground-enhanced sampling loss. It also has an image-wise classifier to effectively aggregate global information and predict patient-level diagnosis. A large-scale multi-phase dataset is collected containing 939 tumor patients and 810 normal subjects. 4010 tumor instances of eight types are extensively annotated. On the non-contrast tumor screening task, PLAN achieves 95% and 96% in patient-level sensitivity and specificity. On contrast-enhanced CT, our lesion-level detection precision, recall, and classification accuracy are 92%, 89%, and 86%, outperforming widely used CNN and transformers for lesion segmentation. We also conduct a reader study on a holdout set of 250 cases. PLAN is on par with a senior human radiologist, showing the clinical significance of our results.

CVAug 9, 2023

SLPT: Selective Labeling Meets Prompt Tuning on Label-Limited Lesion Segmentation

Fan Bai, Ke Yan, Xiaoyu Bai et al.

Medical image analysis using deep learning is often challenged by limited labeled data and high annotation costs. Fine-tuning the entire network in label-limited scenarios can lead to overfitting and suboptimal performance. Recently, prompt tuning has emerged as a more promising technique that introduces a few additional tunable parameters as prompts to a task-agnostic pre-trained model, and updates only these parameters using supervision from limited labeled data while keeping the pre-trained model unchanged. However, previous work has overlooked the importance of selective labeling in downstream tasks, which aims to select the most valuable downstream samples for annotation to achieve the best performance with minimum annotation cost. To address this, we propose a framework that combines selective labeling with prompt tuning (SLPT) to boost performance in limited labels. Specifically, we introduce a feature-aware prompt updater to guide prompt tuning and a TandEm Selective LAbeling (TESLA) strategy. TESLA includes unsupervised diversity selection and supervised selection using prompt-based uncertainty. In addition, we propose a diversified visual prompt tuning strategy to provide multi-prompt-based discrepant predictions for TESLA. We evaluate our method on liver tumor segmentation and achieve state-of-the-art performance, outperforming traditional fine-tuning with only 6% of tunable parameters, also achieving 94% of full-data performance by labeling only 5% of the data.

CVJul 7, 2023

Matching in the Wild: Learning Anatomical Embeddings for Multi-Modality Images

Xiaoyu Bai, Fan Bai, Xiaofei Huo et al.

Radiotherapists require accurate registration of MR/CT images to effectively use information from both modalities. In a typical registration pipeline, rigid or affine transformations are applied to roughly align the fixed and moving images before proceeding with the deformation step. While recent learning-based methods have shown promising results in the rigid/affine step, these methods often require images with similar field-of-view (FOV) for successful alignment. As a result, aligning images with different FOVs remains a challenging task. Self-supervised landmark detection methods like self-supervised Anatomical eMbedding (SAM) have emerged as a useful tool for mapping and cropping images to similar FOVs. However, these methods are currently limited to intra-modality use only. To address this limitation and enable cross-modality matching, we propose a new approach called Cross-SAM. Our approach utilizes a novel iterative process that alternates between embedding learning and CT-MRI registration. We start by applying aggressive contrast augmentation on both CT and MRI images to train a SAM model. We then use this SAM to identify corresponding regions on paired images using robust grid-points matching, followed by a point-set based affine/rigid registration, and a deformable fine-tuning step to produce registered paired images. We use these registered pairs to enhance the matching ability of SAM, which is then processed iteratively. We use the final model for cross-modality matching tasks. We evaluated our approach on two CT-MRI affine registration datasets and found that Cross-SAM achieved robust affine registration on both datasets, significantly outperforming other methods and achieving state-of-the-art performance.

CVMar 27, 2023

An End-to-End Framework For Universal Lesion Detection With Missing Annotations

Xiaoyu Bai, Yong Xia

Fully annotated large-scale medical image datasets are highly valuable. However, because labeling medical images is tedious and requires specialized knowledge, the large-scale datasets available often have missing annotation issues. For instance, DeepLesion, a large-scale CT image dataset with labels for various kinds of lesions, is reported to have a missing annotation rate of 50\%. Directly training a lesion detector on it would suffer from false negative supervision caused by unannotated lesions. To address this issue, previous works have used sophisticated multi-stage strategies to switch between lesion mining and detector training. In this work, we present a novel end-to-end framework for mining unlabeled lesions while simultaneously training the detector. Our framework follows the teacher-student paradigm. In each iteration, the teacher model infers the input data and creates a set of predictions. High-confidence predictions are combined with partially-labeled ground truth for training the student model. On the DeepLesion dataset, using the original partially labeled training set, our model can outperform all other more complicated methods and surpass the previous best method by 2.3\% on average sensitivity and 2.7\% on average precision, achieving state-of-the-art universal lesion detection results.

CVSep 23, 2023

Tackling the Incomplete Annotation Issue in Universal Lesion Detection Task By Exploratory Training

Xiaoyu Bai, Benteng Ma, Changyang Li et al.

Universal lesion detection has great value for clinical practice as it aims to detect various types of lesions in multiple organs on medical images. Deep learning methods have shown promising results, but demanding large volumes of annotated data for training. However, annotating medical images is costly and requires specialized knowledge. The diverse forms and contrasts of objects in medical images make fully annotation even more challenging, resulting in incomplete annotations. Directly training ULD detectors on such datasets can yield suboptimal results. Pseudo-label-based methods examine the training data and mine unlabelled objects for retraining, which have shown to be effective to tackle this issue. Presently, top-performing methods rely on a dynamic label-mining mechanism, operating at the mini-batch level. However, the model's performance varies at different iterations, leading to inconsistencies in the quality of the mined labels and limits their performance enhancement. Inspired by the observation that deep models learn concepts with increasing complexity, we introduce an innovative exploratory training to assess the reliability of mined lesions over time. Specifically, we introduce a teacher-student detection model as basis, where the teacher's predictions are combined with incomplete annotations to train the student. Additionally, we design a prediction bank to record high-confidence predictions. Each sample is trained several times, allowing us to get a sequence of records for each sample. If a prediction consistently appears in the record sequence, it is likely to be a true object, otherwise it may just a noise. This serves as a crucial criterion for selecting reliable mined lesions for retraining. Our experimental results substantiate that the proposed framework surpasses state-of-the-art methods on two medical image datasets, demonstrating its superior performance.

CVJun 24, 2023

SAM++: Enhancing Anatomic Matching using Semantic Information and Structural Inference

Xiaoyu Bai, Yong Xia

Medical images like CT and MRI provide detailed information about the internal structure of the body, and identifying key anatomical structures from these images plays a crucial role in clinical workflows. Current methods treat it as a registration or key-point regression task, which has limitations in accurate matching and can only handle predefined landmarks. Recently, some methods have been introduced to address these limitations. One such method, called SAM, proposes using a dense self-supervised approach to learn a distinct embedding for each point on the CT image and achieving promising results. Nonetheless, SAM may still face difficulties when dealing with structures that have similar appearances but different semantic meanings or similar semantic meanings but different appearances. To overcome these limitations, we propose SAM++, a framework that simultaneously learns appearance and semantic embeddings with a novel fixed-points matching mechanism. We tested the SAM++ framework on two challenging tasks, demonstrating a significant improvement over the performance of SAM and outperforming other existing methods.

CVNov 25, 2023

UAE: Universal Anatomical Embedding on Multi-modality Medical Images

Xiaoyu Bai, Fan Bai, Xiaofei Huo et al.

Identifying specific anatomical structures (\textit{e.g.}, lesions or landmarks) in medical images plays a fundamental role in medical image analysis. Exemplar-based landmark detection methods are receiving increasing attention since they can detect arbitrary anatomical points in inference while do not need landmark annotations in training. They use self-supervised learning to acquire a discriminative embedding for each voxel within the image. These approaches can identify corresponding landmarks through nearest neighbor matching and has demonstrated promising results across various tasks. However, current methods still face challenges in: (1) differentiating voxels with similar appearance but different semantic meanings (\textit{e.g.}, two adjacent structures without clear borders); (2) matching voxels with similar semantics but markedly different appearance (\textit{e.g.}, the same vessel before and after contrast injection); and (3) cross-modality matching (\textit{e.g.}, CT-MRI landmark-based registration). To overcome these challenges, we propose universal anatomical embedding (UAE), which is a unified framework designed to learn appearance, semantic, and cross-modality anatomical embeddings. Specifically, UAE incorporates three key innovations: (1) semantic embedding learning with prototypical contrastive loss; (2) a fixed-point-based matching strategy; and (3) an iterative approach for cross-modality embedding learning. We thoroughly evaluated UAE across intra- and inter-modality tasks, including one-shot landmark detection, lesion tracking on longitudinal CT scans, and CT-MRI affine/rigid registration with varying field of view. Our results suggest that UAE outperforms state-of-the-art methods, offering a robust and versatile approach for landmark based medical image analysis tasks. Code and trained models are available at: \href{https://shorturl.at/bgsB3}

CVDec 13, 2020

Fully-Automated Liver Tumor Localization and Characterization from Multi-Phase MR Volumes Using Key-Slice ROI Parsing: A Physician-Inspired Approach

Bolin Lai, Yuhsuan Wu, Xiaoyu Bai et al.

Using radiological scans to identify liver tumors is crucial for proper patient treatment. This is highly challenging, as top radiologists only achieve F1 scores of roughly 80% (hepatocellular carcinoma (HCC) vs. others) with only moderate inter-rater agreement, even when using multi-phase magnetic resonance (MR) imagery. Thus, there is great impetus for computer-aided diagnosis (CAD) solutions. A critical challenge is to robustly parse a 3D MR volume to localize diagnosable regions of interest (ROI), especially for edge cases. In this paper, we break down this problem using a key-slice parser (KSP), which emulates physician workflows by first identifying key slices and then localizing their corresponding key ROIs. To achieve robustness, the KSP also uses curve-parsing and detection confidence re-weighting. We evaluate our approach on the largest multi-phase MR liver lesion test dataset to date (430 biopsy-confirmed patients). Experiments demonstrate that our KSP can localize diagnosable ROIs with high reliability: 87% patients have an average 3D overlap of >= 40% with the ground truth compared to only 79% using the best tested detector. When coupled with a classifier, we achieve an HCC vs. others F1 score of 0.801, providing a fully-automated CAD performance comparable to top human physicians.