LGJun 14, 2023Code
Improving Generalization in Meta-Learning via Meta-Gradient AugmentationRen Wang, Haoliang Sun, Qi Wei et al.
Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing schemes solve it by enhancing the mutual-exclusivity or diversity of training samples, but these data manipulation strategies are data-dependent and insufficiently flexible. This work alleviates overfitting in meta-learning from the perspective of gradient regularization and proposes a data-independent \textbf{M}eta-\textbf{G}radient \textbf{Aug}mentation (\textbf{MGAug}) method. The key idea is to first break the rote memories by network pruning to address memorization overfitting in the inner loop, and then the gradients of pruned sub-networks naturally form the high-quality augmentation of the meta-gradient to alleviate learner overfitting in the outer loop. Specifically, we explore three pruning strategies, including \textit{random width pruning}, \textit{random parameter pruning}, and a newly proposed \textit{catfish pruning} that measures a Meta-Memorization Carrying Amount (MMCA) score for each parameter and prunes high-score ones to break rote memories as much as possible. The proposed MGAug is theoretically guaranteed by the generalization bound from the PAC-Bayes framework. In addition, we extend a lightweight version, called MGAug-MaxUp, as a trade-off between performance gains and resource overhead. Extensive experiments on multiple few-shot learning benchmarks validate MGAug's effectiveness and significant improvement over various meta-baselines. The code is publicly available at \url{https://github.com/xxLifeLover/Meta-Gradient-Augmentation}.
CVAug 24, 2022
Self-Filtering: A Noise-Aware Sample Selection for Label Noise with Confidence PenalizationQi Wei, Haoliang Sun, Xiankai Lu et al.
Sample selection is an effective strategy to mitigate the effect of label noise in robust learning. Typical strategies commonly apply the small-loss criterion to identify clean samples. However, those samples lying around the decision boundary with large losses usually entangle with noisy examples, which would be discarded with this criterion, leading to the heavy degeneration of the generalization performance. In this paper, we propose a novel selection strategy, \textbf{S}elf-\textbf{F}il\textbf{t}ering (SFT), that utilizes the fluctuation of noisy examples in historical predictions to filter them, which can avoid the selection bias of the small-loss criterion for the boundary examples. Specifically, we introduce a memory bank module that stores the historical predictions of each example and dynamically updates to support the selection for the subsequent learning iteration. Besides, to reduce the accumulated error of the sample selection bias of SFT, we devise a regularization term to penalize the confident output distribution. By increasing the weight of the misclassified categories with this term, the loss function is robust to label noise in mild conditions. We conduct extensive experiments on three benchmarks with variant noise types and achieve the new state-of-the-art. Ablation studies and further analysis verify the virtue of SFT for sample selection in robust learning.
CVMar 4, 2023
Fine-Grained Classification with Noisy LabelsQi Wei, Lei Feng, Haoliang Sun et al.
Learning with noisy labels (LNL) aims to ensure model generalization given a label-corrupted training set. In this work, we investigate a rarely studied scenario of LNL on fine-grained datasets (LNL-FG), which is more practical and challenging as large inter-class ambiguities among fine-grained classes cause more noisy labels. We empirically show that existing methods that work well for LNL fail to achieve satisfying performance for LNL-FG, arising the practical need of effective solutions for LNL-FG. To this end, we propose a novel framework called stochastic noise-tolerated supervised contrastive learning (SNSCL) that confronts label noise by encouraging distinguishable representation. Specifically, we design a noise-tolerated supervised contrastive learning loss that incorporates a weight-aware mechanism for noisy label correction and selectively updating momentum queue lists. By this mechanism, we mitigate the effects of noisy anchors and avoid inserting noisy labels into the momentum-updated queue. Besides, to avoid manually-defined augmentation strategies in contrastive learning, we propose an efficient stochastic module that samples feature embeddings from a generated distribution, which can also enhance the representation ability of deep models. SNSCL is general and compatible with prevailing robust LNL strategies to improve their performance for LNL-FG. Extensive experiments demonstrate the effectiveness of SNSCL.
CVMar 11, 2023
MetaViewer: Towards A Unified Multi-View RepresentationRen Wang, Haoliang Sun, Yuling Ma et al.
Existing multi-view representation learning methods typically follow a specific-to-uniform pipeline, extracting latent features from each view and then fusing or aligning them to obtain the unified object representation. However, the manually pre-specify fusion functions and view-private redundant information mixed in features potentially degrade the quality of the derived representation. To overcome them, we propose a novel bi-level-optimization-based multi-view learning framework, where the representation is learned in a uniform-to-specific manner. Specifically, we train a meta-learner, namely MetaViewer, to learn fusion and model the view-shared meta representation in outer-level optimization. Start with this meta representation, view-specific base-learners are then required to rapidly reconstruct the corresponding view in inner-level. MetaViewer eventually updates by observing reconstruction processes from uniform to specific over all views, and learns an optimal fusion scheme that separates and filters out view-private information. Extensive experimental results in downstream tasks such as classification and clustering demonstrate the effectiveness of our method.
16.5CLMar 16
From Text to Forecasts: Bridging Modality Gap with Temporal Evolution Semantic SpaceLehui Li, Yuyao Wang, Jisheng Yan et al.
Incorporating textual information into time-series forecasting holds promise for addressing event-driven non-stationarity; however, a fundamental modality gap hinders effective fusion: textual descriptions express temporal impacts implicitly and qualitatively, whereas forecasting models rely on explicit and quantitative signals. Through controlled semi-synthetic experiments, we show that existing methods over-attend to redundant tokens and struggle to reliably translate textual semantics into usable numerical cues. To bridge this gap, we propose TESS, which introduces a Temporal Evolution Semantic Space as an intermediate bottleneck between modalities. This space consists of interpretable, numerically grounded temporal primitives (mean shift, volatility, shape, and lag) extracted from text by an LLM via structured prompting and filtered through confidence-aware gating. Experiments on four real-world datasets demonstrate up to a 29 percent reduction in forecasting error compared to state-of-the-art unimodal and multimodal baselines. The code will be released after acceptance.
74.3CLApr 9Code
Distributed Multi-Layer Editing for Rule-Level Knowledge in Large Language ModelsYating Wang, Wenting Zhao, Yaqi Zhao et al.
Large language models store not only isolated facts but also rules that support reasoning across symbolic expressions, natural language explanations, and concrete instances. Yet most model editing methods are built for fact-level knowledge, assuming that a target edit can be achieved through a localized intervention. This assumption does not hold for rule-level knowledge, where a single rule must remain consistent across multiple interdependent forms. We investigate this problem through a mechanistic study of rule-level knowledge editing. To support this study, we extend the RuleEdit benchmark from 80 to 200 manually verified rules spanning mathematics and physics. Fine-grained causal tracing reveals a form-specific organization of rule knowledge in transformer layers: formulas and descriptions are concentrated in earlier layers, while instances are more associated with middle layers. These results suggest that rule knowledge is not uniformly localized, and therefore cannot be reliably edited by a single-layer or contiguous-block intervention. Based on this insight, we propose Distributed Multi-Layer Editing (DMLE), which applies a shared early-layer update to formulas and descriptions and a separate middle-layer update to instances. While remaining competitive on standard editing metrics, DMLE achieves substantially stronger rule-level editing performance. On average, it improves instance portability and rule understanding by 13.91 and 50.19 percentage points, respectively, over the strongest baseline across GPT-J-6B, Qwen2.5-7B, Qwen2-7B, and LLaMA-3-8B. The code is available at https://github.com/Pepper66/DMLE.
61.4CVMar 17
EFF-Grasp: Energy-Field Flow Matching for Physics-Aware Dexterous Grasp GenerationYukun Zhao, Zichen Zhong, Yongshun Gong et al.
Denoising generative models have recently become the dominant paradigm for dexterous grasp generation, owing to their ability to model complex grasp distributions from large-scale data. However, existing diffusion-based methods typically formulate generation as a stochastic differential equation (SDE), which often requires many sequential denoising steps and introduces trajectory instability that can lead to physically infeasible grasps. In this paper, we propose EFF-Grasp, a novel Flow-Matching-based framework for physics-aware dexterous grasp generation. Specifically, we reformulate grasp synthesis as a deterministic ordinary differential equation (ODE) process, which enables efficient and stable generation through smooth probability flows. To further enforce physical feasibility, we introduce a training-free physics-aware energy guidance strategy. Our method defines an energy-guided target distribution using adapted explicit physical energy functions that capture key grasp constraints, and estimates the corresponding guidance term via a local Monte Carlo approximation during inference. In this way, EFF-Grasp dynamically steers the generation trajectory toward physically feasible regions without requiring additional physics-based training or simulation feedback. Extensive experiments on five benchmark datasets show that EFF-Grasp achieves superior performance in grasp quality and physical feasibility, while requiring substantially fewer sampling steps than diffusion-based baselines.
30.4CVApr 8
Holistic Optimal Label Selection for Robust Prompt Learning under Partial LabelsYaqi Zhao, Haoliang Sun, Yating Wang et al.
Prompt learning has gained significant attention as a parameter-efficient approach for adapting large pre-trained vision-language models to downstream tasks. However, when only partial labels are available, its performance is often limited by label ambiguity and insufficient supervisory information. To address this issue, we propose Holistic Optimal Label Selection (HopS), leveraging the generalization ability of pre-trained feature encoders through two complementary strategies. First, we design a local density-based filter that selects the top frequent labels from the nearest neighbors' candidate sets and uses the softmax scores to identify the most plausible label, capturing structural regularities in the feature space. Second, we introduce a global selection objective based on optimal transport that maps the uniform sampling distribution to the candidate label distributions across a batch. By minimizing the expected transport cost, it can determine the most likely label assignments. These two strategies work together to provide robust label selection from both local and global perspectives. Extensive experiments on eight benchmark datasets show that HopS consistently improves performance under partial supervision and outperforms all baselines. Those results highlight the merit of holistic label selection and offer a practical solution for prompt learning in weakly supervised settings.
29.1CVApr 14
DeferredSeg: A Multi-Expert Deferral Framework for Trustworthy Medical Image SegmentationQiuyu Tian, Haoliang Sun, Yunshan Wang et al.
Segmentation models based on deep neural networks demonstrate strong generalization for medical image segmentation. However, they often exhibit overconfidence or underconfidence, leading to unreliable confidence scores for segmentation masks, especially in ambiguous regions. This undermines the trustworthiness required for clinical deployment. Motivated by the learning-to-defer (L2D) paradigm, we introduce DeferredSeg, a deferral-aware segmentation framework, i.e., a Human--AI collaboration system that determines whether to defer predictions to human experts in specific regions. DeferredSeg extends the base segmentor with an aggregated deferral predictor and additional routing channels that dynamically route each pixel to either the base segmentor or a human expert. To train this routing efficiently, we introduce a pixel-wise surrogate collaboration loss that supervises deferral decisions. In addition, to preserve spatial coherence within deferral regions, we propose a spatial-coherence loss that enforces smooth deferral masks, thereby enhancing reliability. Beyond single-expert deferral, we further extend the framework to a multi-expert setting by introducing multiple discrepancy experts for collaborative decision-making. To prevent overloading or underutilizing individual experts, we further design a load-balancing penalty that evenly distributes workload across expert branches. We evaluate DeferredSeg on three challenging medical datasets using MedSAM and CENet as the base segmentor for fair comparison. Experimental results show that DeferredSeg consistently outperforms the baseline, demonstrating its effectiveness for trustworthy dense medical segmentation. Moreover, the proposed framework is model-agnostic and can be readily applied to other segmentation architectures.
LGDec 24, 2024Code
Towards Macro-AUC oriented Imbalanced Multi-Label Continual LearningYan Zhang, Guoqiang Wu, Bingzheng Wang et al.
In Continual Learning (CL), while existing work primarily focuses on the multi-class classification task, there has been limited research on Multi-Label Learning (MLL). In practice, MLL datasets are often class-imbalanced, making it inherently challenging, a problem that is even more acute in CL. Due to its sensitivity to imbalance, Macro-AUC is an appropriate and widely used measure in MLL. However, there is no research to optimize Macro-AUC in MLCL specifically. To fill this gap, in this paper, we propose a new memory replay-based method to tackle the imbalance issue for Macro-AUC-oriented MLCL. Specifically, inspired by recent theory work, we propose a new Reweighted Label-Distribution-Aware Margin (RLDAM) loss. Furthermore, to be compatible with the RLDAM loss, a new memory-updating strategy named Weight Retain Updating (WRU) is proposed to maintain the numbers of positive and negative instances of the original dataset in memory. Theoretically, we provide superior generalization analyses of the RLDAM-based algorithm in terms of Macro-AUC, separately in batch MLL and MLCL settings. This is the first work to offer theoretical generalization analyses in MLCL to our knowledge. Finally, a series of experimental results illustrate the effectiveness of our method over several baselines. Our codes are available at https://github.com/ML-Group-SDU/Macro-AUC-CL.
43.7LGMar 18
Variational Rectification Inference for Learning with Noisy LabelsHaoliang Sun, Qi Wei, Lei Feng et al.
Label noise has been broadly observed in real-world datasets. To mitigate the negative impact of overfitting to label noise for deep models, effective strategies (\textit{e.g.}, re-weighting, or loss rectification) have been broadly applied in prevailing approaches, which have been generally learned under the meta-learning scenario. Despite the robustness of noise achieved by the probabilistic meta-learning models, they usually suffer from model collapse that degenerates generalization performance. In this paper, we propose variational rectification inference (VRI) to formulate the adaptive rectification for loss functions as an amortized variational inference problem and derive the evidence lower bound under the meta-learning framework. Specifically, VRI is constructed as a hierarchical Bayes by treating the rectifying vector as a latent variable, which can rectify the loss of the noisy sample with the extra randomness regularization and is, therefore, more robust to label noise. To achieve the inference of the rectifying vector, we approximate its conditional posterior with an amortization meta-network. By introducing the variational term in VRI, the conditional posterior is estimated accurately and avoids collapsing to a Dirac delta function, which can significantly improve the generalization performance. The elaborated meta-network and prior network adhere to the smoothness assumption, enabling the generation of reliable rectification vectors. Given a set of clean meta-data, VRI can be efficiently meta-learned within the bi-level optimization programming. Besides, theoretical analysis guarantees that the meta-network can be efficiently learned with our algorithm. Comprehensive comparison experiments and analyses validate its effectiveness for robust learning with noisy labels, particularly in the presence of open-set noise.
23.8LGMar 11
Riemannian MeanFlow for One-Step Generation on ManifoldsZichen Zhong, Haoliang Sun, Yukun Zhao et al.
Flow Matching enables simulation-free training of generative models on Riemannian manifolds, yet sampling typically still relies on numerically integrating a probability-flow ODE. We propose Riemannian MeanFlow (RMF), extending MeanFlow to manifold-valued generation where velocities lie in location-dependent tangent spaces. RMF defines an average-velocity field via parallel transport and derives a Riemannian MeanFlow identity that links average and instantaneous velocities for intrinsic supervision. We make this identity practical in a log-map tangent representation, avoiding trajectory simulation and heavy geometric computations. For stable optimization, we decompose the RMF objective into two terms and apply conflict-aware multi-task learning to mitigate gradient interference. RMF also supports conditional generation via classifier-free guidance. Experiments on spheres, tori, and SO(3) demonstrate competitive one-step sampling with improved quality-efficiency trade-offs and substantially reduced sampling cost.
28.7CVMay 6
Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt LearningYating Wang, Yaqi Zhao, Yongshun Gong et al.
Vision-language models such as CLIP achieve strong visual-textual alignment, but often suffer from overfitting and limited interpretability when adapted through continuous prompt learning. While discrete prompt optimization improves interpretability, it usually depends on large external models, leading to high computational costs and limited scalability. In this paper, we propose Interpretable Prompt Learning (IPL), a hybrid framework that alternates between discrete semantic token selection and continuous prompt optimization. Specifically, IPL formulates semantic token selection as an approximate submodular optimization problem, encouraging tokens that are both human-understandable and semantically diverse. It further adopts an alternating optimization strategy to integrate discrete token selection with continuous prompt tuning, improving interpretability while preserving adaptability to downstream tasks. Our framework is plug-and-play, allowing seamless integration with existing prompt learning methods. Extensive experiments on multiple benchmarks show that IPL consistently improves both interpretability and accuracy across five representative prompt learning methods, providing an effective and scalable extension to existing frameworks.
LGFeb 3, 2025
CausalCOMRL: Context-Based Offline Meta-Reinforcement Learning with Causal RepresentationZhengzhe Zhang, Wenjia Meng, Haoliang Sun et al.
Context-based offline meta-reinforcement learning (OMRL) methods have achieved appealing success by leveraging pre-collected offline datasets to develop task representations that guide policy learning. However, current context-based OMRL methods often introduce spurious correlations, where task components are incorrectly correlated due to confounders. These correlations can degrade policy performance when the confounders in the test task differ from those in the training task. To address this problem, we propose CausalCOMRL, a context-based OMRL method that integrates causal representation learning. This approach uncovers causal relationships among the task components and incorporates the causal relationships into task representations, enhancing the generalizability of RL agents. We further improve the distinction of task representations from different tasks by using mutual information optimization and contrastive learning. Utilizing these causal task representations, we employ SAC to optimize policies on meta-RL benchmarks. Experimental results show that CausalCOMRL achieves better performance than other methods on most benchmarks.
LGNov 8, 2021
Learning to Rectify for Robust Learning with Noisy LabelsHaoliang Sun, Chenhui Guo, Qi Wei et al.
Label noise significantly degrades the generalization ability of deep models in applications. Effective strategies and approaches, \textit{e.g.} re-weighting, or loss correction, are designed to alleviate the negative impact of label noise when training a neural network. Those existing works usually rely on the pre-specified architecture and manually tuning the additional hyper-parameters. In this paper, we propose warped probabilistic inference (WarPI) to achieve adaptively rectifying the training procedure for the classification network within the meta-learning scenario. In contrast to the deterministic models, WarPI is formulated as a hierarchical probabilistic model by learning an amortization meta-network, which can resolve sample ambiguity and be therefore more robust to serious label noise. Unlike the existing approximated weighting function of directly generating weight values from losses, our meta-network is learned to estimate a rectifying vector from the input of the logits and labels, which has the capability of leveraging sufficient information lying in them. This provides an effective way to rectify the learning procedure for the classification network, demonstrating a significant improvement of the generalization ability. Besides, modeling the rectifying vector as a latent variable and learning the meta-network can be seamlessly integrated into the SGD optimization of the classification network. We evaluate WarPI on four benchmarks of robust learning with noisy labels and achieve the new state-of-the-art under variant noise types. Extensive study and analysis also demonstrate the effectiveness of our model.
CVAug 13, 2021
Learning Transferable Parameters for Unsupervised Domain AdaptationZhongyi Han, Haoliang Sun, Yilong Yin
Unsupervised domain adaptation (UDA) enables a learning machine to adapt from a labeled source domain to an unlabeled domain under the distribution shift. Thanks to the strong representation ability of deep neural networks, recent remarkable achievements in UDA resort to learning domain-invariant features. Intuitively, the hope is that a good feature representation, together with the hypothesis learned from the source domain, can generalize well to the target domain. However, the learning processes of domain-invariant features and source hypothesis inevitably involve domain-specific information that would degrade the generalizability of UDA models on the target domain. In this paper, motivated by the lottery ticket hypothesis that only partial parameters are essential for generalization, we find that only partial parameters are essential for learning domain-invariant information and generalizing well in UDA. Such parameters are termed transferable parameters. In contrast, the other parameters tend to fit domain-specific details and often fail to generalize, which we term as untransferable parameters. Driven by this insight, we propose Transferable Parameter Learning (TransPar) to reduce the side effect brought by domain-specific information in the learning process and thus enhance the memorization of domain-invariant information. Specifically, according to the distribution discrepancy degree, we divide all parameters into transferable and untransferable ones in each training iteration. We then perform separate updates rules for the two types of parameters. Extensive experiments on image classification and regression tasks (keypoint detection) show that TransPar outperforms prior arts by non-trivial margins. Moreover, experiments demonstrate that TransPar can be integrated into the most popular deep UDA networks and be easily extended to handle any data distribution shift scenarios.
CVMay 14, 2021
Attentional Prototype Inference for Few-Shot SegmentationHaoliang Sun, Xiankai Lu, Haochen Wang et al.
This paper aims to address few-shot segmentation. While existing prototype-based methods have achieved considerable success, they suffer from uncertainty and ambiguity caused by limited labeled examples. In this work, we propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation. We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution. The probabilistic modeling of the prototype enhances the model's generalization ability by handling the inherent uncertainty caused by limited data and intra-class variations of objects. To further enhance the model, we introduce a local latent variable to represent the attention map of each query image, which enables the model to attend to foreground objects while suppressing the background. The optimization of the proposed model is formulated as a variational Bayesian inference problem, which is established by amortized inference networks. We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods. We also provide comprehensive analyses and ablation studies to gain insight into the effectiveness of our method for few-shot segmentation.
LGMay 8, 2021
MetaKernel: Learning Variational Random Features with Limited LabelsYingjun Du, Haoliang Sun, Xiantong Zhen et al.
Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks. The crux of few-shot learning is to extract prior knowledge from related tasks to enable fast adaptation to a new task with a limited amount of data. In this paper, we propose meta-learning kernels with random Fourier features for few-shot learning, we call MetaKernel. Specifically, we propose learning variational random features in a data-driven manner to obtain task-specific kernels by leveraging the shared knowledge provided by related tasks in a meta-learning setting. We treat the random feature basis as the latent variable, which is estimated by variational inference. The shared knowledge from related tasks is incorporated into a context inference of the posterior, which we achieve via a long-short term memory module. To establish more expressive kernels, we deploy conditional normalizing flows based on coupling layers to achieve a richer posterior distribution over random Fourier bases. The resultant kernels are more informative and discriminative, which further improves the few-shot learning. To evaluate our method, we conduct extensive experiments on both few-shot image classification and regression tasks. A thorough ablation study demonstrates that the effectiveness of each introduced component in our method. The benchmark results on fourteen datasets demonstrate MetaKernel consistently delivers at least comparable and often better performance than state-of-the-art alternatives.
CVDec 23, 2020
Direct Estimation of Spinal Cobb Angles by Structured Multi-Output RegressionHaoliang Sun, Xiantong Zhen, Chris Bailey et al.
The Cobb angle that quantitatively evaluates the spinal curvature plays an important role in the scoliosis diagnosis and treatment. Conventional measurement of these angles suffers from huge variability and low reliability due to intensive manual intervention. However, since there exist high ambiguity and variability around boundaries of vertebrae, it is challenging to obtain Cobb angles automatically. In this paper, we formulate the estimation of the Cobb angles from spinal X-rays as a multi-output regression task. We propose structured support vector regression (S^2VR) to jointly estimate Cobb angles and landmarks of the spine in X-rays in one single framework. The proposed S^2VR can faithfully handle the nonlinear relationship between input images and quantitative outputs, while explicitly capturing the intrinsic correlation of outputs. We introduce the manifold regularization to exploit the geometry of the output space. We propose learning the kernel in S2VR by kernel target alignment to enhance its discriminative ability. The proposed method is evaluated on the spinal X-rays dataset of 439 scoliosis subjects, which achieves the inspiring correlation coefficient of 92.76% with ground truth obtained manually by human experts and outperforms two baseline methods. Our method achieves the direct estimation of Cobb angles with high accuracy, which indicates its great potential in clinical use.
LGJun 11, 2020
Learning to Learn Kernels with Variational Random FeaturesXiantong Zhen, Haoliang Sun, Yingjun Du et al.
In this work, we introduce kernels with random Fourier features in the meta-learning framework to leverage their strong few-shot learning ability. We propose meta variational random features (MetaVRF) to learn adaptive kernels for the base-learner, which is developed in a latent variable model by treating the random feature basis as the latent variable. We formulate the optimization of MetaVRF as a variational inference problem by deriving an evidence lower bound under the meta-learning framework. To incorporate shared knowledge from related tasks, we propose a context inference of the posterior, which is established by an LSTM architecture. The LSTM-based inference network can effectively integrate the context information of previous tasks with task-specific information, generating informative and adaptive features. The learned MetaVRF can produce kernels of high representational power with a relatively low spectral sampling rate and also enables fast adaptation to new tasks. Experimental results on a variety of few-shot regression and classification tasks demonstrate that MetaVRF delivers much better, or at least competitive, performance compared to existing meta-learning alternatives.
IVAug 21, 2019
DUAL-GLOW: Conditional Flow-Based Generative Model for Modality TransferHaoliang Sun, Ronak Mehta, Hao H. Zhou et al.
Positron emission tomography (PET) imaging is an imaging modality for diagnosing a number of neurological diseases. In contrast to Magnetic Resonance Imaging (MRI), PET is costly and involves injecting a radioactive substance into the patient. Motivated by developments in modality transfer in vision, we study the generation of certain types of PET images from MRI data. We derive new flow-based generative models which we show perform well in this small sample size regime (much smaller than dataset sizes available in standard vision tasks). Our formulation, DUAL-GLOW, is based on two invertible networks and a relation network that maps the latent spaces to each other. We discuss how given the prior distribution, learning the conditional distribution of PET given the MRI image reduces to obtaining the conditional distribution between the two latent codes w.r.t. the two image types. We also extend our framework to leverage 'side' information (or attributes) when available. By controlling the PET generation through 'conditioning' on age, our model is also able to capture brain FDG-PET (hypometabolism) changes, as a function of age. We present experiments on the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset with 826 subjects, and obtain good performance in PET image synthesis, qualitatively and quantitatively better than recent works.