CVOct 25, 2022Code
Geo-SIC: Learning Deformable Geometric Shapes in Deep Image ClassifiersJian Wang, Miaomiao Zhang
Deformable shapes provide important and complex geometric features of objects presented in images. However, such information is oftentimes missing or underutilized as implicit knowledge in many image analysis tasks. This paper presents Geo-SIC, the first deep learning model to learn deformable shapes in a deformation space for an improved performance of image classification. We introduce a newly designed framework that (i) simultaneously derives features from both image and latent shape spaces with large intra-class variations; and (ii) gains increased model interpretability by allowing direct access to the underlying geometric features of image data. In particular, we develop a boosted classification network, equipped with an unsupervised learning of geometric shape representations characterized by diffeomorphic transformations within each class. In contrast to previous approaches using pre-extracted shapes, our model provides a more fundamental approach by naturally learning the most relevant shape features jointly with an image classifier. We demonstrate the effectiveness of our method on both simulated 2D images and real 3D brain magnetic resonance (MR) images. Experimental results show that our model substantially improves the image classification accuracy with an additional benefit of increased model interpretability. Our code is publicly available at https://github.com/jw4hv/Geo-SIC
CVMar 8, 2023
MetaMorph: Learning Metamorphic Image Transformation With Appearance ChangesJian Wang, Jiarui Xing, Jason Druzgal et al.
This paper presents a novel predictive model, MetaMorph, for metamorphic registration of images with appearance changes (i.e., caused by brain tumors). In contrast to previous learning-based registration methods that have little or no control over appearance-changes, our model introduces a new regularization that can effectively suppress the negative effects of appearance changing areas. In particular, we develop a piecewise regularization on the tangent space of diffeomorphic transformations (also known as initial velocity fields) via learned segmentation maps of abnormal regions. The geometric transformation and appearance changes are treated as joint tasks that are mutually beneficial. Our model MetaMorph is more robust and accurate when searching for an optimal registration solution under the guidance of segmentation, which in turn improves the segmentation performance by providing appropriately augmented training labels. We validate MetaMorph on real 3D human brain tumor magnetic resonance imaging (MRI) scans. Experimental results show that our model outperforms the state-of-the-art learning-based registration models. The proposed MetaMorph has great potential in various image-guided clinical interventions, e.g., real-time image-guided navigation systems for tumor removal surgery.
IVNov 11, 2022
Multitask Learning for Improved Late Mechanical Activation Detection of Heart from Cine DENSE MRIJiarui Xing, Shuo Wang, Kenneth C. Bilchick et al.
The selection of an optimal pacing site, which is ideally scar-free and late activated, is critical to the response of cardiac resynchronization therapy (CRT). Despite the success of current approaches formulating the detection of such late mechanical activation (LMA) regions as a problem of activation time regression, their accuracy remains unsatisfactory, particularly in cases where myocardial scar exists. To address this issue, this paper introduces a multi-task deep learning framework that simultaneously estimates LMA amount and classify the scar-free LMA regions based on cine displacement encoding with stimulated echoes (DENSE) magnetic resonance imaging (MRI). With a newly introduced auxiliary LMA region classification sub-network, our proposed model shows more robustness to the complex pattern cause by myocardial scar, significantly eliminates their negative effects in LMA detection, and in turn improves the performance of scar classification. To evaluate the effectiveness of our method, we tests our model on real cardiac MR images and compare the predicted LMA with the state-of-the-art approaches. It shows that our approach achieves substantially increased accuracy. In addition, we employ the gradient-weighted class activation mapping (Grad-CAM) to visualize the feature maps learned by all methods. Experimental results suggest that our proposed model better recognizes the LMA region pattern.
IVNov 11, 2022
Joint Deep Learning for Improved Myocardial Scar Detection from Cardiac MRIJiarui Xing, Shuo Wang, Kenneth C. Bilchick et al.
Automated identification of myocardial scar from late gadolinium enhancement cardiac magnetic resonance images (LGE-CMR) is limited by image noise and artifacts such as those related to motion and partial volume effect. This paper presents a novel joint deep learning (JDL) framework that improves such tasks by utilizing simultaneously learned myocardium segmentations to eliminate negative effects from non-region-of-interest areas. In contrast to previous approaches treating scar detection and myocardium segmentation as separate or parallel tasks, our proposed method introduces a message passing module where the information of myocardium segmentation is directly passed to guide scar detectors. This newly designed network will efficiently exploit joint information from the two related tasks and use all available sources of myocardium segmentation to benefit scar identification. We demonstrate the effectiveness of JDL on LGE-CMR images for automated left ventricular (LV) scar detection, with great potential to improve risk prediction in patients with both ischemic and non-ischemic heart disease and to improve response rates to cardiac resynchronization therapy (CRT) for heart failure patients. Experimental results show that our proposed approach outperforms multiple state-of-the-art methods, including commonly used two-step segmentation-classification networks, and multitask learning schemes where subtasks are indirectly interacted.
CVJul 15, 2024Code
TLRN: Temporal Latent Residual Networks For Large Deformation Image RegistrationNian Wu, Jiarui Xing, Miaomiao Zhang
This paper presents a novel approach, termed {\em Temporal Latent Residual Network (TLRN)}, to predict a sequence of deformation fields in time-series image registration. The challenge of registering time-series images often lies in the occurrence of large motions, especially when images differ significantly from a reference (e.g., the start of a cardiac cycle compared to the peak stretching phase). To achieve accurate and robust registration results, we leverage the nature of motion continuity and exploit the temporal smoothness in consecutive image frames. Our proposed TLRN highlights a temporal residual network with residual blocks carefully designed in latent deformation spaces, which are parameterized by time-sequential initial velocity fields. We treat a sequence of residual blocks over time as a dynamic training system, where each block is designed to learn the residual function between desired deformation features and current input accumulated from previous time frames. We validate the effectivenss of TLRN on both synthetic data and real-world cine cardiac magnetic resonance (CMR) image videos. Our experimental results shows that TLRN is able to achieve substantially improved registration accuracy compared to the state-of-the-art. Our code is publicly available at https://github.com/nellie689/TLRN.
CVJul 19, 2024
Interior Object Geometry via Fitted FramesStephen M. Pizer, Zhiyuan Liu, Junjie Zhao et al.
We propose a means of computing fitted frames on the boundary and in the interior of objects and using them to provide the basis for producing geometric features from them that are not only alignment-free but most importantly can be made to correspond locally across a population of objects. We describe a representation targeted for anatomic objects which is designed to enable this strong locational correspondence within object populations and thus to provide powerful object statistics. It accomplishes this by understanding an object as the diffeomorphic deformation of the closure of the interior of an ellipsoid and by using a skeletal representation fitted throughout the deformation to produce a model of the target object, where the object is provided initially in the form of a boundary mesh. Via classification performance on hippocampi shape between individuals with a disorder vs. others, we compare our method to two state-of-theart methods for producing object representations that are intended to capture geometric correspondence across a population of objects and to yield geometric features useful for statistics, and we show notably improved classification performance by this new representation, which we call the evolutionary s-rep. The geometric features that are derived from each of the representations, especially via fitted frames, are discussed.
CVMar 13, 2023
NeurEPDiff: Neural Operators to Predict Geodesics in Deformation SpacesNian Wu, Miaomiao Zhang
This paper presents NeurEPDiff, a novel network to fast predict the geodesics in deformation spaces generated by a well known Euler-Poincaré differential equation (EPDiff). To achieve this, we develop a neural operator that for the first time learns the evolving trajectory of geodesic deformations parameterized in the tangent space of diffeomorphisms(a.k.a velocity fields). In contrast to previous methods that purely fit the training images, our proposed NeurEPDiff learns a nonlinear mapping function between the time-dependent velocity fields. A composition of integral operators and smooth activation functions is formulated in each layer of NeurEPDiff to effectively approximate such mappings. The fact that NeurEPDiff is able to rapidly provide the numerical solution of EPDiff (given any initial condition) results in a significantly reduced computational cost of geodesic shooting of diffeomorphisms in a high-dimensional image space. Additionally, the properties of discretiztion/resolution-invariant of NeurEPDiff make its performance generalizable to multiple image resolutions after being trained offline. We demonstrate the effectiveness of NeurEPDiff in registering two image datasets: 2D synthetic data and 3D brain resonance imaging (MRI). The registration accuracy and computational efficiency are compared with the state-of-the-art diffeomophic registration algorithms with geodesic shooting.
CVJul 2, 2024Code
LaMoD: Latent Motion Diffusion Model For Myocardial Strain GenerationJiarui Xing, Nivetha Jayakumar, Nian Wu et al.
Motion and deformation analysis of cardiac magnetic resonance (CMR) imaging videos is crucial for assessing myocardial strain of patients with abnormal heart functions. Recent advances in deep learning-based image registration algorithms have shown promising results in predicting motion fields from routinely acquired CMR sequences. However, their accuracy often diminishes in regions with subtle appearance changes, with errors propagating over time. Advanced imaging techniques, such as displacement encoding with stimulated echoes (DENSE) CMR, offer highly accurate and reproducible motion data but require additional image acquisition, which poses challenges in busy clinical flows. In this paper, we introduce a novel Latent Motion Diffusion model (LaMoD) to predict highly accurate DENSE motions from standard CMR videos. More specifically, our method first employs an encoder from a pre-trained registration network that learns latent motion features (also considered as deformation-based shape features) from image sequences. Supervised by the ground-truth motion provided by DENSE, LaMoD then leverages a probabilistic latent diffusion model to reconstruct accurate motion from these extracted features. Experimental results demonstrate that our proposed method, LaMoD, significantly improves the accuracy of motion analysis in standard CMR images; hence improving myocardial strain analysis in clinical settings for cardiac patients. Our code is publicly available at https://github.com/jr-xing/LaMoD.
CVDec 3, 2025Code
Learning Group Actions In Disentangled Latent Image RepresentationsFarhana Hossain Swarnali, Miaomiao Zhang, Tonmoy Hossain
Modeling group actions on latent representations enables controllable transformations of high-dimensional image data. Prior works applying group-theoretic priors or modeling transformations typically operate in the high-dimensional data space, where group actions apply uniformly across the entire input, making it difficult to disentangle the subspace that varies under transformations. While latent-space methods offer greater flexibility, they still require manual partitioning of latent variables into equivariant and invariant subspaces, limiting the ability to robustly learn and operate group actions within the representation space. To address this, we introduce a novel end-to-end framework that for the first time learns group actions on latent image manifolds, automatically discovering transformation-relevant structures without manual intervention. Our method uses learnable binary masks with straight-through estimation to dynamically partition latent representations into transformation-sensitive and invariant components. We formulate this within a unified optimization framework that jointly learns latent disentanglement and group transformation mappings. The framework can be seamlessly integrated with any standard encoder-decoder architecture. We validate our approach on five 2D/3D image datasets, demonstrating its ability to automatically learn disentangled latent factors for group actions in diverse data, while downstream classification tasks confirm the effectiveness of the learned representations. Our code is publicly available at https://github.com/farhanaswarnali/Learning-Group-Actions-In-Disentangled-Latent-Image-Representations .
IVFeb 27, 2023
Multimodal Deep Learning to Differentiate Tumor Recurrence from Treatment Effect in Human GlioblastomaTonmoy Hossain, Zoraiz Qureshi, Nivetha Jayakumar et al.
Differentiating tumor progression (TP) from treatment-related necrosis (TN) is critical for clinical management decisions in glioblastoma (GBM). Dynamic FDG PET (dPET), an advance from traditional static FDG PET, may prove advantageous in clinical staging. dPET includes novel methods of a model-corrected blood input function that accounts for partial volume averaging to compute parametric maps that reveal kinetic information. In a preliminary study, a convolution neural network (CNN) was trained to predict classification accuracy between TP and TN for $35$ brain tumors from $26$ subjects in the PET-MR image space. 3D parametric PET Ki (from dPET), traditional static PET standardized uptake values (SUV), and also the brain tumor MR voxels formed the input for the CNN. The average test accuracy across all leave-one-out cross-validation iterations adjusting for class weights was $0.56$ using only the MR, $0.65$ using only the SUV, and $0.71$ using only the Ki voxels. Combining SUV and MR voxels increased the test accuracy to $0.62$. On the other hand, MR and Ki voxels increased the test accuracy to $0.74$. Thus, dPET features alone or with MR features in deep learning models would enhance prediction accuracy in differentiating TP vs TN in GBM.
CVJan 21
Anatomically Guided Latent Diffusion for Brain MRI Progression ModelingCheng Wan, Bahram Jafrasteh, Ehsan Adeli et al.
Accurately modeling longitudinal brain MRI progression is crucial for understanding neurodegenerative diseases and predicting individualized structural changes. Existing state-of-the-art approaches, such as Brain Latent Progression (BrLP), often use multi-stage training pipelines with auxiliary conditioning modules but suffer from architectural complexity, suboptimal use of conditional clinical covariates, and limited guarantees of anatomical consistency. We propose Anatomically Guided Latent Diffusion Model (AG-LDM), a segmentation-guided framework that enforces anatomically consistent progression while substantially simplifying the training pipeline. AG-LDM conditions latent diffusion by directly fusing baseline anatomy, noisy follow-up states, and clinical covariates at the input level, a strategy that avoids auxiliary control networks by learning a unified, end-to-end model that represents both anatomy and progression. A lightweight 3D tissue segmentation model (WarpSeg) provides explicit anatomical supervision during both autoencoder fine-tuning and diffusion model training, ensuring consistent brain tissue boundaries and morphometric fidelity. Experiments on 31,713 ADNI longitudinal pairs and zero-shot evaluation on OASIS-3 demonstrate that AG-LDM matches or surpasses more complex diffusion models, achieving state-of-the-art image quality and 15-20\% reduction in volumetric errors in generated images. AG-LDM also exhibits markedly stronger utilization of temporal and clinical covariates (up to 31.5x higher sensitivity than BrLP) and generates biologically plausible counterfactual trajectories, accurately capturing hallmarks of Alzheimer's progression such as limbic atrophy and ventricular expansion. These results highlight AG-LDM as an efficient, anatomically grounded framework for reliable brain MRI progression modeling.
NISep 28, 2023
Design of JiuTian Intelligent Network Simulation PlatformLei Zhao, Miaomiao Zhang, Guangyu Li et al.
This paper introduced the JiuTian Intelligent Network Simulation Platform, which can provide wireless communication simulation data services for the Open Innovation Platform. The platform contains a series of scalable simulator functionalities, offering open services that enable users to use reinforcement learning algorithms for model training and inference based on simulation environments and data. Additionally, it allows users to address optimization tasks in different scenarios by uploading and updating parameter configurations. The platform and its open services were primarily introduced from the perspectives of background, overall architecture, simulator, business scenarios, and future directions.
CVApr 24Code
Generative Modeling of Neurodegenerative Brain Anatomy with 4D Longitudinal Diffusion ModelNivetha Jayakumar, Swakshar Deb, Bahram Jafrasteh et al.
Understanding and predicting the progression of neurodegenerative diseases remains a major challenge in medical AI, with significant implications for early diagnosis, disease monitoring, and treatment planning. However, most available longitudinal neuroimaging datasets are temporally sparse with a few follow-up scans per subject. This scarcity of temporal data limits our ability to model and accurately capture the continuous anatomical changes related to disease progression in individual subjects. To address this problem, we propose a novel 4D (3DxT) diffusion-based generative framework that effectively models and synthesizes longitudinal brain anatomy over time, conditioned on available clinical variables such as health status, age, sex, and other relevant factors. Moreover, while most current approaches focus on manipulating image intensity or texture, our method explicitly learns the data distribution of topology-preserving spatiotemporal deformations to effectively capture the geometric changes of brain structures over time. This design enables the realistic generation of future anatomical states and the reconstruction of anatomically consistent disease trajectories, providing a more faithful representation of longitudinal brain changes. We validate our model through both synthetic sequence generation and downstream longitudinal disease classification, as well as brain segmentation. Experiments on two large-scale longitudinal neuroimage datasets demonstrate that our method outperforms state-of-the-art baselines in generating anatomically accurate, temporally consistent, and clinically meaningful brain trajectories. Our code is available on Github.
CVSep 6, 2023
SADIR: Shape-Aware Diffusion Models for 3D Image ReconstructionNivetha Jayakumar, Tonmoy Hossain, Miaomiao Zhang
3D image reconstruction from a limited number of 2D images has been a long-standing challenge in computer vision and image analysis. While deep learning-based approaches have achieved impressive performance in this area, existing deep networks often fail to effectively utilize the shape structures of objects presented in images. As a result, the topology of reconstructed objects may not be well preserved, leading to the presence of artifacts such as discontinuities, holes, or mismatched connections between different parts. In this paper, we propose a shape-aware network based on diffusion models for 3D image reconstruction, named SADIR, to address these issues. In contrast to previous methods that primarily rely on spatial correlations of image intensities for 3D reconstruction, our model leverages shape priors learned from the training data to guide the reconstruction process. To achieve this, we develop a joint learning network that simultaneously learns a mean shape under deformation models. Each reconstructed image is then considered as a deformed variant of the mean shape. We validate our model, SADIR, on both brain and cardiac magnetic resonance images (MRIs). Experimental results show that our method outperforms the baselines with lower reconstruction error and better preservation of the shape structure of objects within the images.
CVNov 19, 2024Code
Invariant Shape Representation Learning For Image ClassificationTonmoy Hossain, Jing Ma, Jundong Li et al.
Geometric shape features have been widely used as strong predictors for image classification. Nevertheless, most existing classifiers such as deep neural networks (DNNs) directly leverage the statistical correlations between these shape features and target variables. However, these correlations can often be spurious and unstable across different environments (e.g., in different age groups, certain types of brain changes have unstable relations with neurodegenerative disease); hence leading to biased or inaccurate predictions. In this paper, we introduce a novel framework that for the first time develops invariant shape representation learning (ISRL) to further strengthen the robustness of image classifiers. In contrast to existing approaches that mainly derive features in the image space, our model ISRL is designed to jointly capture invariant features in latent shape spaces parameterized by deformable transformations. To achieve this goal, we develop a new learning paradigm based on invariant risk minimization (IRM) to learn invariant representations of image and shape features across multiple training distributions/environments. By embedding the features that are invariant with regard to target variables in different environments, our model consistently offers more accurate predictions. We validate our method by performing classification tasks on both simulated 2D images, real 3D brain and cine cardiovascular magnetic resonance images (MRIs). Our code is publicly available at https://github.com/tonmoy-hossain/ISRL.
CVDec 20, 2023Code
MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image DeformationsTonmoy Hossain, Miaomiao Zhang
Geometric transformations have been widely used to augment the size of training images. Existing methods often assume a unimodal distribution of the underlying transformations between images, which limits their power when data with multimodal distributions occur. In this paper, we propose a novel model, Multimodal Geometric Augmentation (MGAug), that for the first time generates augmenting transformations in a multimodal latent space of geometric deformations. To achieve this, we first develop a deep network that embeds the learning of latent geometric spaces of diffeomorphic transformations (a.k.a. diffeomorphisms) in a variational autoencoder (VAE). A mixture of multivariate Gaussians is formulated in the tangent space of diffeomorphisms and serves as a prior to approximate the hidden distribution of image transformations. We then augment the original training dataset by deforming images using randomly sampled transformations from the learned multimodal latent space of VAE. To validate the efficiency of our model, we jointly learn the augmentation strategy with two distinct domain-specific tasks: multi-class classification on 2D synthetic datasets and segmentation on real 3D brain magnetic resonance images (MRIs). We also compare MGAug with state-of-the-art transformation-based image augmentation algorithms. Experimental results show that our proposed approach outperforms all baselines by significantly improved prediction accuracy. Our code is publicly available at https://github.com/tonmoy-hossain/MGAug.
CVMar 30
Curriculum-Guided Myocardial Scar Segmentation for Ischemic and Non-ischemic CardiomyopathyNivetha Jayakumar, Jonathan Pan, Shuo Wang et al.
Identification and quantification of myocardial scar is important for diagnosis and prognosis of cardiovascular diseases. However, reliable scar segmentation from Late Gadolinium Enhancement Cardiac Magnetic Resonance (LGE-CMR) images remains a challenge due to variations in contrast enhancement across patients, suboptimal imaging conditions such as post contrast washout, and inconsistencies in ground truth annotations on diffuse scars caused by inter observer variability. In this work, we propose a curriculum learning-based framework designed to improve segmentation performance under these challenging conditions. The method introduces a progressive training strategy that guides the model from high-confidence, clearly defined scar regions to low confidence or visually ambiguous samples with limited scar burden. By structuring the learning process in this manner, the network develops robustness to uncertain labels and subtle scar appearances that are often underrepresented in conventional training pipelines. Experimental results show that the proposed approach enhances segmentation accuracy and consistency, particularly for cases with minimal or diffuse scar, outperforming standard training baselines. This strategy provides a principled way to leverage imperfect data for improved myocardial scar quantification in clinical applications. Our code is publicly available on GitHub.
CVMay 13, 2025Code
IrrMap: A Large-Scale Comprehensive Dataset for Irrigation Method MappingNibir Chandra Mandal, Oishee Bintey Hoque, Abhijin Adiga et al.
We introduce IrrMap, the first large-scale dataset (1.1 million patches) for irrigation method mapping across regions. IrrMap consists of multi-resolution satellite imagery from LandSat and Sentinel, along with key auxiliary data such as crop type, land use, and vegetation indices. The dataset spans 1,687,899 farms and 14,117,330 acres across multiple western U.S. states from 2013 to 2023, providing a rich and diverse foundation for irrigation analysis and ensuring geospatial alignment and quality control. The dataset is ML-ready, with standardized 224x224 GeoTIFF patches, the multiple input modalities, carefully chosen train-test-split data, and accompanying dataloaders for seamless deep learning model training andbenchmarking in irrigation mapping. The dataset is also accompanied by a complete pipeline for dataset generation, enabling researchers to extend IrrMap to new regions for irrigation data collection or adapt it with minimal effort for other similar applications in agricultural and geospatial analysis. We also analyze the irrigation method distribution across crop groups, spatial irrigation patterns (using Shannon diversity indices), and irrigated area variations for both LandSat and Sentinel, providing insights into regional and resolution-based differences. To promote further exploration, we openly release IrrMap, along with the derived datasets, benchmark models, and pipeline code, through a GitHub repository: https://github.com/Nibir088/IrrMap and Data repository: https://huggingface.co/Nibir/IrrMap, providing comprehensive documentation and implementation details.
CVMar 21, 2025Code
CoRLD: Contrastive Representation Learning Of Deformable Shapes In ImagesTonmoy Hossain, Miaomiao Zhang
Deformable shape representations, parameterized by deformations relative to a given template, have proven effective for improved image analysis tasks. However, their broader applicability is hindered by two major challenges. First, existing methods mainly rely on a known template during testing, which is impractical and limits flexibility. Second, they often struggle to capture fine-grained, voxel-level distinctions between similar shapes (e.g., anatomical variations among healthy individuals, those with mild cognitive impairment, and diseased states). To address these limitations, we propose a novel framework - Contrastive Representation Learning of Deformable shapes (CoRLD) in learned deformation spaces and demonstrate its effectiveness in the context of image classification. Our CoRLD leverages a class-aware contrastive supervised learning objective in latent deformation spaces, promoting proximity among representations of similar classes while ensuring separation of dissimilar groups. In contrast to previous deep learning networks that require a reference image as input to predict deformation changes, our approach eliminates this dependency. Instead, template images are utilized solely as ground truth in the loss function during the training process, making our model more flexible and generalizable to a wide range of medical applications. We validate CoRLD on diverse datasets, including real brain magnetic resonance imaging (MRIs) and adrenal shapes derived from computed tomography (CT) scans. Experimental results show that our model effectively extracts deformable shape features, which can be easily integrated with existing classifiers to substantially boost the classification accuracy. Our code is available at GitHub.
CVNov 22, 2024Code
TPIE: Topology-Preserved Image Editing With Text InstructionsNivetha Jayakumar, Srivardhan Reddy Gadila, Tonmoy Hossain et al.
Preserving topological structures is important in real-world applications, particularly in sensitive domains such as healthcare and medicine, where the correctness of human anatomy is critical. However, most existing image editing models focus on manipulating intensity and texture features, often overlooking object geometry within images. To address this issue, this paper introduces a novel method, Topology-Preserved Image Editing with text instructions (TPIE), that for the first time ensures the topology and geometry remaining intact in edited images through text-guided generative diffusion models. More specifically, our method treats newly generated samples as deformable variations of a given input template, allowing for controllable and structure-preserving edits. Our proposed TPIE framework consists of two key modules: (i) an autoencoder-based registration network that learns latent representations of object transformations, parameterized by velocity fields, from pairwise training images; and (ii) a novel latent conditional geometric diffusion (LCDG) model efficiently capturing the data distribution of learned transformation features conditioned on custom-defined text instructions. We validate TPIE on a diverse set of 2D and 3D images and compare them with state-of-the-art image editing approaches. Experimental results show that our method outperforms other baselines in generating more realistic images with well-preserved topology. Our code will be made publicly available on Github.
IVJan 3, 2022Code
BDG-Net: Boundary Distribution Guided Network for Accurate Polyp SegmentationZihuan Qiu, Zhichuan Wang, Miaomiao Zhang et al.
Colorectal cancer (CRC) is one of the most common fatal cancer in the world. Polypectomy can effectively interrupt the progression of adenoma to adenocarcinoma, thus reducing the risk of CRC development. Colonoscopy is the primary method to find colonic polyps. However, due to the different sizes of polyps and the unclear boundary between polyps and their surrounding mucosa, it is challenging to segment polyps accurately. To address this problem, we design a Boundary Distribution Guided Network (BDG-Net) for accurate polyp segmentation. Specifically, under the supervision of the ideal Boundary Distribution Map (BDM), we use Boundary Distribution Generate Module (BDGM) to aggregate high-level features and generate BDM. Then, BDM is sent to the Boundary Distribution Guided Decoder (BDGD) as complementary spatial information to guide the polyp segmentation. Moreover, a multi-scale feature interaction strategy is adopted in BDGD to improve the segmentation accuracy of polyps with different sizes. Extensive quantitative and qualitative evaluations demonstrate the effectiveness of our model, which outperforms state-of-the-art models remarkably on five public polyp datasets while maintaining low computational complexity. Code: https://github.com/zihuanqiu/BDG-Net
CVJul 12, 2021Code
Bayesian Atlas Building with Hierarchical Priors for Subject-specific RegularizationJian Wang, Miaomiao Zhang
This paper presents a novel hierarchical Bayesian model for unbiased atlas building with subject-specific regularizations of image registration. We develop an atlas construction process that automatically selects parameters to control the smoothness of diffeomorphic transformation according to individual image data. To achieve this, we introduce a hierarchical prior distribution on regularization parameters that allows multiple penalties on images with various degrees of geometric transformations. We then treat the regularization parameters as latent variables and integrate them out from the model by using the Monte Carlo Expectation Maximization (MCEM) algorithm. Another advantage of our algorithm is that it eliminates the need for manual parameter tuning, which can be tedious and infeasible. We demonstrate the effectiveness of our model on 3D brain MR images. Experimental results show that our model provides a sharper atlas compared to the current atlas building algorithms with single-penalty regularizations. Our code is publicly available at https://github.com/jw4hv/HierarchicalBayesianAtlasBuild.
IVApr 5, 2020Code
DeepFLASH: An Efficient Network for Learning-based Medical Image RegistrationJian Wang, Miaomiao Zhang
This paper presents DeepFLASH, a novel network with efficient training and inference for learning-based medical image registration. In contrast to existing approaches that learn spatial transformations from training data in the high dimensional imaging space, we develop a new registration network entirely in a low dimensional bandlimited space. This dramatically reduces the computational cost and memory footprint of an expensive training and inference. To achieve this goal, we first introduce complex-valued operations and representations of neural architectures that provide key components for learning-based registration models. We then construct an explicit loss function of transformation fields fully characterized in a bandlimited space with much fewer parameterizations. Experimental results show that our method is significantly faster than the state-of-the-art deep learning based image registration methods, while producing equally accurate alignment. We demonstrate our algorithm in two different applications of image registration: 2D synthetic data and 3D real brain magnetic resonance (MR) images. Our code is available at https://github.com/jw4hv/deepflash.
NINov 7, 2023
Emulators in JINSPLei Zhao, Miaomiao Zhang, Lv Zhe
JINSP(Jiutian Intelligence Network Simulation Platform) describes a series of basic emulators and their combinations, such as the simulation of the protocol stack for dynamic users in a real environment, which is composed of user behavior simulation, base station simulation, and terminal simulation. It is applied in specific business scenarios, such as multi-target antenna optimization, compression feedback, and so on. This paper provides detailed descriptions of each emulator and its combination based on this foundation, including the implementation process of the emulator, integration with the platform, experimental results, and other aspects.
CVFeb 28, 2024
Multimodal Learning To Improve Cardiac Late Mechanical Activation Detection From Cine MR ImagesJiarui Xing, Nian Wu, Kenneth Bilchick et al.
This paper presents a multimodal deep learning framework that utilizes advanced image techniques to improve the performance of clinical analysis heavily dependent on routinely acquired standard images. More specifically, we develop a joint learning network that for the first time leverages the accuracy and reproducibility of myocardial strains obtained from Displacement Encoding with Stimulated Echo (DENSE) to guide the analysis of cine cardiac magnetic resonance (CMR) imaging in late mechanical activation (LMA) detection. An image registration network is utilized to acquire the knowledge of cardiac motions, an important feature estimator of strain values, from standard cine CMRs. Our framework consists of two major components: (i) a DENSE-supervised strain network leveraging latent motion features learned from a registration network to predict myocardial strains; and (ii) a LMA network taking advantage of the predicted strain for effective LMA detection. Experimental results show that our proposed work substantially improves the performance of strain analysis and LMA detection from cine CMR images, aligning more closely with the achievements of DENSE.
CVOct 24, 2024
Learning Geodesics of Geometric Shape Deformations From ImagesNian Wu, Miaomiao Zhang
This paper presents a novel method, named geodesic deformable networks (GDN), that for the first time enables the learning of geodesic flows of deformation fields derived from images. In particular, the capability of our proposed GDN being able to predict geodesics is important for quantifying and comparing deformable shape presented in images. The geodesic deformations, also known as optimal transformations that align pairwise images, are often parameterized by a time sequence of smooth vector fields governed by nonlinear differential equations. A bountiful literature has been focusing on learning the initial conditions (e.g., initial velocity fields) based on registration networks. However, the definition of geodesics central to deformation-based shape analysis is blind to the networks. To address this problem, we carefully develop an efficient neural operator to treat the geodesics as unknown mapping functions learned from the latent deformation spaces. A composition of integral operators and smooth activation functions is then formulated to effectively approximate such mappings. In contrast to previous works, our GDN jointly optimizes a newly defined geodesic loss, which adds additional benefits to promote the network regularizability and generalizability. We demonstrate the effectiveness of GDN on both 2D synthetic data and 3D real brain magnetic resonance imaging (MRI).
IVFeb 5, 2024
An end-to-end deep learning pipeline to derive blood input with partial volume corrections for automated parametric brain PET mappingRugved Chavan, Gabriel Hyman, Zoraiz Qureshi et al.
Dynamic 2-[18F] fluoro-2-deoxy-D-glucose positron emission tomography (dFDG-PET) for human brain imaging has considerable clinical potential, yet its utilization remains limited. A key challenge in the quantitative analysis of dFDG-PET is characterizing a patient-specific blood input function, traditionally reliant on invasive arterial blood sampling. This research introduces a novel approach employing non-invasive deep learning model-based computations from the internal carotid arteries (ICA) with partial volume (PV) corrections, thereby eliminating the need for invasive arterial sampling. We present an end-to-end pipeline incorporating a 3D U-Net based ICA-net for ICA segmentation, alongside a Recurrent Neural Network (RNN) based MCIF-net for the derivation of a model-corrected blood input function (MCIF) with PV corrections. The developed 3D U-Net and RNN was trained and validated using a 5-fold cross-validation approach on 50 human brain FDG PET datasets. The ICA-net achieved an average Dice score of 82.18% and an Intersection over Union of 68.54% across all tested scans. Furthermore, the MCIF-net exhibited a minimal root mean squared error of 0.0052. The application of this pipeline to ground truth data for dFDG-PET brain scans resulted in the precise localization of seizure onset regions, which contributed to a successful clinical outcome, with the patient achieving a seizure-free state after treatment. These results underscore the efficacy of the ICA-net and MCIF-net deep learning pipeline in learning the ICA structure's distribution and automating MCIF computation with PV corrections. This advancement marks a significant leap in non-invasive neuroimaging.
HCOct 14, 2025
Who is a Better Matchmaker? Human vs. Algorithmic Judge Assignment in a High-Stakes Startup CompetitionSarina Xi, Orelia Pi, Miaomiao Zhang et al.
There is growing interest in applying artificial intelligence (AI) to automate and support complex decision-making tasks. However, it remains unclear how algorithms compare to human judgment in contexts requiring semantic understanding and domain expertise. We examine this in the context of the judge assignment problem, matching submissions to suitably qualified judges. Specifically, we tackled this problem at the Harvard President's Innovation Challenge, the university's premier venture competition awarding over \$500,000 to student and alumni startups. This represents a real-world environment where high-quality judge assignment is essential. We developed an AI-based judge-assignment algorithm, Hybrid Lexical-Semantic Similarity Ensemble (HLSE), and deployed it at the competition. We then evaluated its performance against human expert assignments using blinded match-quality scores from judges on $309$ judge-venture pairs. Using a Mann-Whitney U statistic based test, we found no statistically significant difference in assignment quality between the two approaches ($AUC=0.48, p=0.40$); on average, algorithmic matches are rated $3.90$ and manual matches $3.94$ on a 5-point scale, where 5 indicates an excellent match. Furthermore, manual assignments that previously required a full week could be automated in several hours by the algorithm during deployment. These results demonstrate that HLSE achieves human-expert-level matching quality while offering greater scalability and efficiency, underscoring the potential of AI-driven solutions to support and enhance human decision-making for judge assignment in high-stakes settings.
AIOct 9, 2025
Control Synthesis of Cyber-Physical Systems for Real-Time Specifications through Causation-Guided Reinforcement LearningXiaochen Tang, Zhenya Zhang, Miaomiao Zhang et al.
In real-time and safety-critical cyber-physical systems (CPSs), control synthesis must guarantee that generated policies meet stringent timing and correctness requirements under uncertain and dynamic conditions. Signal temporal logic (STL) has emerged as a powerful formalism of expressing real-time constraints, with its semantics enabling quantitative assessment of system behavior. Meanwhile, reinforcement learning (RL) has become an important method for solving control synthesis problems in unknown environments. Recent studies incorporate STL-based reward functions into RL to automatically synthesize control policies. However, the automatically inferred rewards obtained by these methods represent the global assessment of a whole or partial path but do not accumulate the rewards of local changes accurately, so the sparse global rewards may lead to non-convergence and unstable training performances. In this paper, we propose an online reward generation method guided by the online causation monitoring of STL. Our approach continuously monitors system behavior against an STL specification at each control step, computing the quantitative distance toward satisfaction or violation and thereby producing rewards that reflect instantaneous state dynamics. Additionally, we provide a smooth approximation of the causation semantics to overcome the discontinuity of the causation semantics and make it differentiable for using deep-RL methods. We have implemented a prototype tool and evaluated it in the Gym environment on a variety of continuously controlled benchmarks. Experimental results show that our proposed STL-guided RL method with online causation semantics outperforms existing relevant STL-guided RL methods, providing a more robust and efficient reward generation framework for deep-RL.
FLSep 8, 2025
On Synthesis of Timed Regular ExpressionsZiran Wang, Jie An, Naijun Zhan et al.
Timed regular expressions serve as a formalism for specifying real-time behaviors of Cyber-Physical Systems. In this paper, we consider the synthesis of timed regular expressions, focusing on generating a timed regular expression consistent with a given set of system behaviors including positive and negative examples, i.e., accepting all positive examples and rejecting all negative examples. We first prove the decidability of the synthesis problem through an exploration of simple timed regular expressions. Subsequently, we propose our method of generating a consistent timed regular expression with minimal length, which unfolds in two steps. The first step is to enumerate and prune candidate parametric timed regular expressions. In the second step, we encode the requirement that a candidate generated by the first step is consistent with the given set into a Satisfiability Modulo Theories (SMT) formula, which is consequently solved to determine a solution to parametric time constraints. Finally, we evaluate our approach on benchmarks, including randomly generated behaviors from target timed models and a case study.
AIDec 14, 2023
Implement services for business scenarios by combining basic emulatorsLei Zhao, Miaomiao Zhang
This article mainly introduces how to use various basic emulators to form a combined emulator in the Jiutian Intelligence Network Simulation Platform to realize simulation service functions in different business scenarios. Among them, the combined emulator is included. The business scenarios include different practical applications such as multi-objective antenna optimization, high traffic of business, CSI (channel state information) compression feedback, etc.
CVDec 13, 2021
Hybrid Atlas Building with Deep Registration PriorsNian Wu, Jian Wang, Miaomiao Zhang et al.
Registration-based atlas building often poses computational challenges in high-dimensional image spaces. In this paper, we introduce a novel hybrid atlas building algorithm that fast estimates atlas from large-scale image datasets with much reduced computational cost. In contrast to previous approaches that iteratively perform registration tasks between an estimated atlas and individual images, we propose to use learned priors of registration from pre-trained neural networks. This newly developed hybrid framework features several advantages of (i) providing an efficient way of atlas building without losing the quality of results, and (ii) offering flexibility in utilizing a wide variety of deep learning based registration methods. We demonstrate the effectiveness of this proposed model on 3D brain magnetic resonance imaging (MRI) scans.
CRMar 4, 2021
Defending Medical Image Diagnostics against Privacy Attacks using Generative MethodsWilliam Paul, Yinzhi Cao, Miaomiao Zhang et al.
Machine learning (ML) models used in medical imaging diagnostics can be vulnerable to a variety of privacy attacks, including membership inference attacks, that lead to violations of regulations governing the use of medical data and threaten to compromise their effective deployment in the clinic. In contrast to most recent work in privacy-aware ML that has been focused on model alteration and post-processing steps, we propose here a novel and complementary scheme that enhances the security of medical data by controlling the data sharing process. We develop and evaluate a privacy defense protocol based on using a generative adversarial network (GAN) that allows a medical data sourcer (e.g. a hospital) to provide an external agent (a modeler) a proxy dataset synthesized from the original images, so that the resulting diagnostic systems made available to model consumers is rendered resilient to privacy attackers. We validate the proposed method on retinal diagnostics AI used for diabetic retinopathy that bears the risk of possibly leaking private information. To incorporate concerns of both privacy advocates and modelers, we introduce a metric to evaluate privacy and utility performance in combination, and demonstrate, using these novel and classical metrics, that our approach, by itself or in conjunction with other defenses, provides state of the art (SOTA) performance for defending against privacy attacks.
IVNov 28, 2020
Deep Learning for Regularization Prediction in Diffeomorphic Image RegistrationJian Wang, Miaomiao Zhang
This paper presents a predictive model for estimating regularization parameters of diffeomorphic image registration. We introduce a novel framework that automatically determines the parameters controlling the smoothness of diffeomorphic transformations. Our method significantly reduces the effort of parameter tuning, which is time and labor-consuming. To achieve the goal, we develop a predictive model based on deep convolutional neural networks (CNN) that learns the mapping between pairwise images and the regularization parameter of image registration. In contrast to previous methods that estimate such parameters in a high-dimensional image space, our model is built in an efficient bandlimited space with much lower dimensions. We demonstrate the effectiveness of our model on both 2D synthetic data and 3D real brain images. Experimental results show that our model not only predicts appropriate regularization parameters for image registration, but also improving the network training in terms of time and memory efficiency.
LGSep 3, 2019
Mixture Probabilistic Principal Geodesic AnalysisYoushan Zhang, Jiarui Xing, Miaomiao Zhang
Dimensionality reduction on Riemannian manifolds is challenging due to the complex nonlinear data structures. While probabilistic principal geodesic analysis~(PPGA) has been proposed to generalize conventional principal component analysis (PCA) onto manifolds, its effectiveness is limited to data with a single modality. In this paper, we present a novel Gaussian latent variable model that provides a unique way to integrate multiple PGA models into a maximum-likelihood framework. This leads to a well-defined mixture model of probabilistic principal geodesic analysis (MPPGA) on sub-populations, where parameters of the principal subspaces are automatically estimated by employing an Expectation Maximization algorithm. We further develop a mixture Bayesian PGA (MBPGA) model that automatically reduces data dimensionality by suppressing irrelevant principal geodesics. We demonstrate the advantages of our model in the contexts of clustering and statistical shape analysis, using synthetic sphere data, real corpus callosum, and mandible data from human brain magnetic resonance~(MR) and CT images.
CVMar 6, 2019
Temporal Registration in Application to In-utero MRI Time SeriesRuizhi Liao, Esra A. Turk, Miaomiao Zhang et al.
We present a robust method to correct for motion in volumetric in-utero MRI time series. Time-course analysis for in-utero volumetric MRI time series often suffers from substantial and unpredictable fetal motion. Registration provides voxel correspondences between images and is commonly employed for motion correction. Current registration methods often fail when aligning images that are substantially different from a template (reference image). To achieve accurate and robust alignment, we make a Markov assumption on the nature of motion and take advantage of the temporal smoothness in the image data. Forward message passing in the corresponding hidden Markov model (HMM) yields an estimation algorithm that only has to account for relatively small motion between consecutive frames. We evaluate the utility of the temporal model in the context of in-utero MRI time series alignment by examining the accuracy of propagated segmentation label maps. Our results suggest that the proposed model captures accurately the temporal dynamics of transformations in in-utero MRI time series.
CVMar 20, 2018
A Feature-Driven Active Framework for Ultrasound-Based Brain Shift CompensationJie Luo, Matt Toews, Ines Machado et al.
A reliable Ultrasound (US)-to-US registration method to compensate for brain shift would substantially improve Image-Guided Neurological Surgery. Developing such a registration method is very challenging, due to factors such as missing correspondence in images, the complexity of brain pathology and the demand for fast computation. We propose a novel feature-driven active framework. Here, landmarks and their displacement are first estimated from a pair of US images using corresponding local image features. Subsequently, a Gaussian Process (GP) model is used to interpolate a dense deformation field from the sparse landmarks. Kernels of the GP are estimated by using variograms and a discrete grid search method. If necessary, the user can actively add new landmarks based on the image context and visualization of the uncertainty measure provided by the GP to further improve the result. We retrospectively demonstrate our registration framework as a robust and accurate brain shift compensation solution on clinical data acquired during neurosurgery.
CVMar 14, 2018
On the Applicability of Registration UncertaintyJie Luo, Alireza Sedghi, Karteek Popuri et al.
Estimating the uncertainty in (probabilistic) image registration enables, e.g., surgeons to assess the operative risk based on the trustworthiness of the registered image data. If surgeons receive inaccurately calculated registration uncertainty and misplace unwarranted confidence in the alignment solutions, severe consequences may result. For probabilistic image registration (PIR), the predominant way to quantify the registration uncertainty is using summary statistics of the distribution of transformation parameters. The majority of existing research focuses on trying out different summary statistics as well as a means to exploit them. Distinctively, in this paper, we study two rarely examined topics: (1) whether those summary statistics of the transformation distribution most informatively represent the registration uncertainty; (2) Does utilizing the registration uncertainty always be beneficial. We show that there are two types of uncertainties: the transformation uncertainty, Ut, and label uncertainty Ul. The conventional way of using Ut to quantify Ul is inappropriate and can be misleading. By a real data experiment, we also share a potentially critical finding that making use of the registration uncertainty may not always be an improvement.
CVAug 14, 2016
SSHMT: Semi-supervised Hierarchical Merge Tree for Electron Microscopy Image SegmentationTing Liu, Miaomiao Zhang, Mehran Javanmardi et al.
Region-based methods have proven necessary for improving segmentation accuracy of neuronal structures in electron microscopy (EM) images. Most region-based segmentation methods use a scoring function to determine region merging. Such functions are usually learned with supervised algorithms that demand considerable ground truth data, which are costly to collect. We propose a semi-supervised approach that reduces this demand. Based on a merge tree structure, we develop a differentiable unsupervised loss term that enforces consistent predictions from the learned function. We then propose a Bayesian model that combines the supervised and the unsupervised information for probabilistic learning. The experimental results on three EM data sets demonstrate that by using a subset of only 3% to 7% of the entire ground truth data, our approach consistently performs close to the state-of-the-art supervised method with the full labeled data set, and significantly outperforms the supervised method with the same labeled subset.
CVAug 12, 2016
Temporal Registration in In-Utero Volumetric MRI Time SeriesRuizhi Liao, Esra Turk, Miaomiao Zhang et al.
We present a robust method to correct for motion and deformations for in-utero volumetric MRI time series. Spatio-temporal analysis of dynamic MRI requires robust alignment across time in the presence of substantial and unpredictable motion. We make a Markov assumption on the nature of deformations to take advantage of the temporal structure in the image data. Forward message passing in the corresponding hidden Markov model (HMM) yields an estimation algorithm that only has to account for relatively small motion between consecutive frames. We demonstrate the utility of the temporal model by showing that its use improves the accuracy of the segmentation propagation through temporal registration. Our results suggest that the proposed model captures accurately the temporal dynamics of deformations in in-utero MRI time series.