Xiaoling Hu

h-index32

19papers

268citations

Novelty54%

AI Score51

Ranked #18,072 of 194,257 authors (top 9%)#6,560 in CV (top 11%)

19 Papers

17.3CVJul 20, 2022Code

Learning Topological Interactions for Multi-Class Medical Image Segmentation

Saumya Gupta, Xiaoling Hu, James Kaan et al.

Deep learning methods have achieved impressive performance for multi-class medical image segmentation. However, they are limited in their ability to encode topological interactions among different classes (e.g., containment and exclusion). These constraints naturally arise in biomedical images and can be crucial in improving segmentation quality. In this paper, we introduce a novel topological interaction module to encode the topological interactions into a deep neural network. The implementation is completely convolution-based and thus can be very efficient. This empowers us to incorporate the constraints into end-to-end training and enrich the feature representation of neural networks. The efficacy of the proposed method is validated on different types of interactions. We also demonstrate the generalizability of the method on both proprietary and public challenge datasets, in both 2D and 3D settings, as well as across different modalities such as CT and Ultrasound. Code is available at: https://github.com/TopoXLab/TopoInteraction

11.0CVNov 28, 2023Code

Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging

Peirong Liu, Oula Puonti, Xiaoling Hu et al.

Recent learning-based approaches have made astonishing advances in calibrated medical imaging like computerized tomography (CT), yet they struggle to generalize in uncalibrated modalities -- notably magnetic resonance (MR) imaging, where performance is highly sensitive to the differences in MR contrast, resolution, and orientation. This prevents broad applicability to diverse real-world clinical protocols. We introduce Brain-ID, an anatomical representation learning model for brain imaging. With the proposed "mild-to-severe" intra-subject generation, Brain-ID is robust to the subject-specific brain anatomy regardless of the appearance of acquired images (e.g., contrast, deformation, resolution, artifacts). Trained entirely on synthetic data, Brain-ID readily adapts to various downstream tasks through only one layer. We present new metrics to validate the intra- and inter-subject robustness of Brain-ID features, and evaluate their performance on four downstream applications, covering contrast-independent (anatomy reconstruction/contrast synthesis, brain segmentation), and contrast-dependent (super-resolution, bias field estimation) tasks. Extensive experiments on six public datasets demonstrate that Brain-ID achieves state-of-the-art performance in all tasks on different MRI modalities and CT, and more importantly, preserves its performance on low-resolution and small datasets. Code is available at https://github.com/peirong26/Brain-ID.

14.9LGJul 19, 2023Code

Confidence Estimation Using Unlabeled Data

Chen Li, Xiaoling Hu, Chao Chen

Overconfidence is a common issue for deep neural networks, limiting their deployment in real-world applications. To better estimate confidence, existing methods mostly focus on fully-supervised scenarios and rely on training labels. In this paper, we propose the first confidence estimation method for a semi-supervised setting, when most training labels are unavailable. We stipulate that even with limited training labels, we can still reasonably approximate the confidence of model on unlabeled samples by inspecting the prediction consistency through the training process. We use training consistency as a surrogate function and propose a consistency ranking loss for confidence estimation. On both image classification and segmentation tasks, our method achieves state-of-the-art performances in confidence estimation. Furthermore, we show the benefit of the proposed method through a downstream active learning task. The code is available at https://github.com/TopoXLab/consistency-ranking-loss

22.0IVJun 3, 2022

Learning Probabilistic Topological Representations Using Discrete Morse Theory

Xiaoling Hu, Dimitris Samaras, Chao Chen

Accurate delineation of fine-scale structures is a very important yet challenging problem. Existing methods use topological information as an additional training loss, but are ultimately making pixel-wise predictions. In this paper, we propose the first deep learning based method to learn topological/structural representations. We use discrete Morse theory and persistent homology to construct an one-parameter family of structures as the topological/structural representation space. Furthermore, we learn a probabilistic model that can perform inference tasks in such a topological/structural representation space. Our method generates true structures rather than pixel-maps, leading to better topological integrity in automatic segmentation tasks. It also facilitates semi-automatic interactive annotation/proofreading via the sampling of structures and structure-aware uncertainty.

9.6CVSep 4, 2024Code

Spatial Diffusion for Cell Layout Generation

Chen Li, Xiaoling Hu, Shahira Abousamra et al.

Generative models, such as GANs and diffusion models, have been used to augment training sets and boost performances in different tasks. We focus on generative models for cell detection instead, i.e., locating and classifying cells in given pathology images. One important information that has been largely overlooked is the spatial patterns of the cells. In this paper, we propose a spatial-pattern-guided generative model for cell layout generation. Specifically, a novel diffusion model guided by spatial features and generates realistic cell layouts has been proposed. We explore different density models as spatial features for the diffusion model. In downstream tasks, we show that the generated cell layouts can be used to guide the generation of high-quality pathology images. Augmenting with these images can significantly boost the performance of SOTA cell detection methods. The code is available at https://github.com/superlc1995/Diffusion-cell.

13.1CVFeb 8, 2023

Enhancing Modality-Agnostic Representations via Meta-Learning for Brain Tumor Segmentation

Aishik Konwer, Xiaoling Hu, Joseph Bae et al.

In medical vision, different imaging modalities provide complementary information. However, in practice, not all modalities may be available during inference or even training. Previous approaches, e.g., knowledge distillation or image synthesis, often assume the availability of full modalities for all patients during training; this is unrealistic and impractical due to the variability in data collection across sites. We propose a novel approach to learn enhanced modality-agnostic representations by employing a meta-learning strategy in training, even when only limited full modality samples are available. Meta-learning enhances partial modality representations to full modality representations by meta-training on partial modality data and meta-testing on limited full modality samples. Additionally, we co-supervise this feature enrichment by introducing an auxiliary adversarial learning branch. More specifically, a missing modality detector is used as a discriminator to mimic the full modality setting. Our segmentation framework significantly outperforms state-of-the-art brain tumor segmentation techniques in missing modality scenarios.

16.4CVAug 19, 2023

Calibrating Uncertainty for Semi-Supervised Crowd Counting

Chen Li, Xiaoling Hu, Shahira Abousamra et al.

Semi-supervised crowd counting is an important yet challenging task. A popular approach is to iteratively generate pseudo-labels for unlabeled data and add them to the training set. The key is to use uncertainty to select reliable pseudo-labels. In this paper, we propose a novel method to calibrate model uncertainty for crowd counting. Our method takes a supervised uncertainty estimation strategy to train the model through a surrogate function. This ensures the uncertainty is well controlled throughout the training. We propose a matching-based patch-wise surrogate function to better approximate uncertainty for crowd counting tasks. The proposed method pays a sufficient amount of attention to details, while maintaining a proper granularity. Altogether our method is able to generate reliable uncertainty estimation, high quality pseudolabels, and achieve state-of-the-art performance in semisupervised crowd counting.

5.7CVJul 21, 2022

Deep Statistic Shape Model for Myocardium Segmentation

Xiaoling Hu, Xiao Chen, Yikang Liu et al.

Accurate segmentation and motion estimation of myocardium have always been important in clinic field, which essentially contribute to the downstream diagnosis. However, existing methods cannot always guarantee the shape integrity for myocardium segmentation. In addition, motion estimation requires point correspondence on the myocardium region across different frames. In this paper, we propose a novel end-to-end deep statistic shape model to focus on myocardium segmentation with both shape integrity and boundary correspondence preserving. Specifically, myocardium shapes are represented by a fixed number of points, whose variations are extracted by Principal Component Analysis (PCA). Deep neural network is used to predict the transformation parameters (both affine and deformation), which are then used to warp the mean point cloud to the image domain. Furthermore, a differentiable rendering layer is introduced to incorporate mask supervision into the framework to learn more accurate point clouds. In this way, the proposed method is able to consistently produce anatomically reasonable segmentation mask without post processing. Additionally, the predicted point cloud guarantees boundary correspondence for sequential images, which contributes to the downstream tasks, such as the motion estimation of myocardium. We conduct several experiments to demonstrate the effectiveness of the proposed method on several benchmark datasets.

19.9IVDec 8, 2024Code

TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model

Meilong Xu, Saumya Gupta, Xiaoling Hu et al.

Accurately modeling multi-class cell topology is crucial in digital pathology, as it provides critical insights into tissue structure and pathology. The synthetic generation of cell topology enables realistic simulations of complex tissue environments, enhances downstream tasks by augmenting training data, aligns more closely with pathologists' domain knowledge, and offers new opportunities for controlling and generalizing the tumor microenvironment. In this paper, we propose a novel approach that integrates topological constraints into a diffusion model to improve the generation of realistic, contextually accurate cell topologies. Our method refines the simulation of cell distributions and interactions, increasing the precision and interpretability of results in downstream tasks such as cell detection and classification. To assess the topological fidelity of generated layouts, we introduce a new metric, Topological Frechet Distance (TopoFD), which overcomes the limitations of traditional metrics like FID in evaluating topological structure. Experimental results demonstrate the effectiveness of our approach in generating multi-class cell layouts that capture intricate topological relationships. Code is available at https://github.com/Melon-Xu/TopoCellGen.

7.6CVOct 11, 2024Code

Hierarchical Uncertainty Estimation for Learning-based Registration in Neuroimaging

Xiaoling Hu, Karthik Gopinath, Peirong Liu et al.

Over recent years, deep learning based image registration has achieved impressive accuracy in many domains, including medical imaging and, specifically, human neuroimaging with magnetic resonance imaging (MRI). However, the uncertainty estimation associated with these methods has been largely limited to the application of generic techniques (e.g., Monte Carlo dropout) that do not exploit the peculiarities of the problem domain, particularly spatial modeling. Here, we propose a principled way to propagate uncertainties (epistemic or aleatoric) estimated at the level of spatial location by these methods, to the level of global transformation models, and further to downstream tasks. Specifically, we justify the choice of a Gaussian distribution for the local uncertainty modeling, and then propose a framework where uncertainties spread across hierarchical levels, depending on the choice of transformation model. Experiments on publicly available data sets show that Monte Carlo dropout correlates very poorly with the reference registration error, whereas our uncertainty estimates correlate much better. Crucially, the results also show that uncertainty-aware fitting of transformations improves the registration accuracy of brain MRI scans. Finally, we illustrate how sampling from the posterior distribution of the transformations can be used to propagate uncertainties to downstream neuroimaging tasks. Code is available at: https://github.com/HuXiaoling/Regre4Regis.

10.2CVAug 30, 2025Code

A Modality-agnostic Multi-task Foundation Model for Human Brain Imaging

Peirong Liu, Oula Puonti, Xiaoling Hu et al.

Recent learning-based approaches have made astonishing advances in calibrated medical imaging like computerized tomography (CT), yet they struggle to generalize in uncalibrated modalities -- notably magnetic resonance (MR) imaging, where performance is highly sensitive to the differences in MR contrast, resolution, and orientation. This prevents broad applicability to diverse real-world clinical protocols. Here we introduce BrainFM, a modality-agnostic, multi-task vision foundation model for human brain imaging. With the proposed "mild-to-severe" intra-subject generation and "real-synth" mix-up training strategy, BrainFM is resilient to the appearance of acquired images (e.g., modality, contrast, deformation, resolution, artifacts), and can be directly applied to five fundamental brain imaging tasks, including image synthesis for CT and T1w/T2w/FLAIR MRI, anatomy segmentation, scalp-to-cortical distance, bias field estimation, and registration. We evaluate the efficacy of BrainFM on eleven public datasets, and demonstrate its robustness and effectiveness across all tasks and input modalities. Code is available at https://github.com/jhuldr/BrainFM.

12.8CVApr 25, 2024

Registration by Regression (RbR): a framework for interpretable and flexible atlas registration

Karthik Gopinath, Xiaoling Hu, Malte Hoffmann et al.

In human neuroimaging studies, atlas registration enables mapping MRI scans to a common coordinate frame, which is necessary to aggregate data from multiple subjects. Machine learning registration methods have achieved excellent speed and accuracy but lack interpretability and flexibility at test time (since their deformation model is fixed). More recently, keypoint-based methods have been proposed to tackle these issues, but their accuracy is still subpar, particularly when fitting nonlinear transforms. Here we propose Registration by Regression (RbR), a novel atlas registration framework that: is highly robust and flexible; can be trained with cheaply obtained data; and operates on a single channel, such that it can also be used as pretraining for other tasks. RbR predicts the (x, y, z) atlas coordinates for every voxel of the input scan (i.e., every voxel is a keypoint), and then uses closed-form expressions to quickly fit transforms using a wide array of possible deformation models, including affine and nonlinear (e.g., Bspline, Demons, invertible diffeomorphic models, etc.). Robustness is provided by the large number of voxels informing the registration and can be further increased by robust estimators like RANSAC. Experiments on independent public datasets show that RbR yields more accurate registration than competing keypoint approaches, over a wide range of deformation models.

5.2CVMar 22, 2024

Learning Topological Representations for Deep Image Understanding

Xiaoling Hu

In many scenarios, especially biomedical applications, the correct delineation of complex fine-scaled structures such as neurons, tissues, and vessels is critical for downstream analysis. Despite the strong predictive power of deep learning methods, they do not provide a satisfactory representation of these structures, thus creating significant barriers in scalable annotation and downstream analysis. In this dissertation, we tackle such challenges by proposing novel representations of these topological structures in a deep learning framework. We leverage the mathematical tools from topological data analysis, i.e., persistent homology and discrete Morse theory, to develop principled methods for better segmentation and uncertainty estimation, which will become powerful tools for scalable annotation.

6.7CLFeb 5, 2025

MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction

Xiao Hu, Eric Liu, Weizhou Wang et al.

Retrieval-Augmented Generation (RAG) offers a solution to mitigate hallucinations in Large Language Models (LLMs) by grounding their outputs to knowledge retrieved from external sources. The use of private resources and data in constructing these external data stores can expose them to risks of extraction attacks, in which attackers attempt to steal data from these private databases. Existing RAG extraction attacks often rely on manually crafted prompts, which limit their effectiveness. In this paper, we introduce a framework called MARAGE for optimizing an adversarial string that, when appended to user queries submitted to a target RAG system, causes outputs containing the retrieved RAG data verbatim. MARAGE leverages a continuous optimization scheme that integrates gradients from multiple models with different architectures simultaneously to enhance the transferability of the optimized string to unseen models. Additionally, we propose a strategy that emphasizes the initial tokens in the target RAG data, further improving the attack's generalizability. Evaluations show that MARAGE consistently outperforms both manual and optimization-based baselines across multiple LLMs and RAG datasets, while maintaining robust transferability to previously unseen models. Moreover, we conduct probing tasks to shed light on the reasons why MARAGE is more effective compared to the baselines and to analyze the impact of our approach on the model's internal state.

10.2CVDec 5, 2025

LoC-Path: Learning to Compress for Pathology Multimodal Large Language Models

Qingqiao Hu, Weimin Lyu, Meilong Xu et al.

Whole Slide Image (WSI) understanding is fundamentally challenging due to its gigapixel scale and the extreme sparsity of diagnostically relevant regions. Unlike human experts who primarily rely on key areas to arrive at a diagnosis, existing slide-level multimodal large language models (MLLMs) for pathology rely on heavy slide-level encoders that process thousands of patch features in a brute-force manner, resulting in excessive computational cost. In this work, we revisit the WSI-language modeling paradigm and show that tile-level features exhibit strong global and local redundancy, whereas only a small subset of tiles are truly task-relevant. Motivated by this observation, we introduce an efficient MLLM framework, called LoC-Path, that replaces the expensive slide-level encoder with redundancy-reducing modules. We first design a Sparse Token Merger (STM) and an MAE-pretrained resampler to remove local redundancy and compress globally redundant tile tokens into a compact slide-level representation set. We then propose a Cross-Attention Routing Adapter (CARA) and a Token Importance Scorer (TIS) to integrate the compressed visual representation with the language model in a computation-efficient manner. Extensive experiments demonstrate that our approach achieves performance comparable to existing state-of-the-art whole-slide MLLMs, while requiring significantly lower computation and memory.

3.6CVSep 27, 2025

Test-time Uncertainty Estimation for Medical Image Registration via Transformation Equivariance

Lin Tian, Xiaoling Hu, Juan Eugenio Iglesias · harvard

Accurate image registration is essential for downstream applications, yet current deep registration networks provide limited indications of whether and when their predictions are reliable. Existing uncertainty estimation strategies, such as Bayesian methods, ensembles, or MC dropout, require architectural changes or retraining, limiting their applicability to pretrained registration networks. Instead, we propose a test-time uncertainty estimation framework that is compatible with any pretrained networks. Our framework is grounded in the transformation equivariance property of registration, which states that the true mapping between two images should remain consistent under spatial perturbations of the input. By analyzing the variance of network predictions under such perturbations, we derive a theoretical decomposition of perturbation-based uncertainty in registration. This decomposition separates into two terms: (i) an intrinsic spread, reflecting epistemic noise, and (ii) a bias jitter, capturing how systematic error drifts under perturbations. Across four anatomical structures (brain, cardiac, abdominal, and lung) and multiple registration models (uniGradICON, SynthMorph), the uncertainty maps correlate consistently with registration errors and highlight regions requiring caution. Our framework turns any pretrained registration network into a risk-aware tool at test time, placing medical image registration one step closer to safe deployment in clinical and large-scale research settings.

3.6CVMay 27, 2025

Scalable Segmentation for Ultra-High-Resolution Brain MR Images

Xiaoling Hu, Peirong Liu, Dina Zemlyanker et al.

Although deep learning has shown great success in 3D brain MRI segmentation, achieving accurate and efficient segmentation of ultra-high-resolution brain images remains challenging due to the lack of labeled training data for fine-scale anatomical structures and high computational demands. In this work, we propose a novel framework that leverages easily accessible, low-resolution coarse labels as spatial references and guidance, without incurring additional annotation cost. Instead of directly predicting discrete segmentation maps, our approach regresses per-class signed distance transform maps, enabling smooth, boundary-aware supervision. Furthermore, to enhance scalability, generalizability, and efficiency, we introduce a scalable class-conditional segmentation strategy, where the model learns to segment one class at a time conditioned on a class-specific input. This novel design not only reduces memory consumption during both training and testing, but also allows the model to generalize to unseen anatomical classes. We validate our method through comprehensive experiments on both synthetic and real-world datasets, demonstrating its superior performance and scalability compared to conventional segmentation approaches.

4.8IVFeb 7, 2022Code

A Topology-Attention ConvLSTM Network and Its Application to EM Images

Jiaqi Yang, Xiaoling Hu, Chao Chen et al.

Structural accuracy of segmentation is important for finescale structures in biomedical images. We propose a novel TopologyAttention ConvLSTM Network (TACNet) for 3D image segmentation in order to achieve high structural accuracy for 3D segmentation tasks. Specifically, we propose a Spatial Topology-Attention (STA) module to process a 3D image as a stack of 2D image slices and adopt ConvLSTM to leverage contextual structure information from adjacent slices. In order to effectively transfer topology-critical information across slices, we propose an Iterative-Topology Attention (ITA) module that provides a more stable topology-critical map for segmentation. Quantitative and qualitative results show that our proposed method outperforms various baselines in terms of topology-aware evaluation metrics.

12.1CVDec 15, 2021

Structure-Aware Image Segmentation with Homotopy Warping

Xiaoling Hu

Besides per-pixel accuracy, topological correctness is also crucial for the segmentation of images with fine-scale structures, e.g., satellite images and biomedical images. In this paper, by leveraging the theory of digital topology, we identify pixels in an image that are critical for topology. By focusing on these critical pixels, we propose a new homotopy warping loss to train deep image segmentation networks for better topological accuracy. To efficiently identify these topologically critical pixels, we propose a new algorithm exploiting the distance transform. The proposed algorithm, as well as the loss function, naturally generalize to different topological structures in both 2D and 3D settings. The proposed loss function helps deep nets achieve better performance in terms of topology-aware metrics, outperforming state-of-the-art structure/topology-aware segmentation methods.