CVNov 25, 2023Code
Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual NoiseZhenning Shi, Haoshuai Zheng, Chen Xu et al.
Recently, research on denoising diffusion models has expanded its application to the field of image restoration. Traditional diffusion-based image restoration methods utilize degraded images as conditional input to effectively guide the reverse generation process, without modifying the original denoising diffusion process. However, since the degraded images already include low-frequency information, starting from Gaussian white noise will result in increased sampling steps. We propose Resfusion, a general framework that incorporates the residual term into the diffusion forward process, starting the reverse process directly from the noisy degraded images. The form of our inference process is consistent with the DDPM. We introduced a weighted residual noise, named resnoise, as the prediction target and explicitly provide the quantitative relationship between the residual term and the noise term in resnoise. By leveraging a smooth equivalence transformation, Resfusion determine the optimal acceleration step and maintains the integrity of existing noise schedules, unifying the training and inference processes. The experimental results demonstrate that Resfusion exhibits competitive performance on ISTD dataset, LOL dataset and Raindrop dataset with only five sampling steps. Furthermore, Resfusion can be easily applied to image generation and emerges with strong versatility. Our code and model are available at https://github.com/nkicsl/Resfusion.
CVJul 4, 2024
C$^3$DG: Conditional Domain Generalization for Hyperspectral Imagery Classification with Convergence and Constrained-risk TheoriesZhe Gao, Bin Pan, Zhenwei Shi
Hyperspectral imagery (HSI) classification may suffer the challenge of hyperspectral-monospectra, where different classes present similar spectra. Joint spatial-spectral feature extraction is a popular solution for the problem, but this strategy tends to inflate accuracy since test pixels may exist in training patches. Domain generalization methods show promising potential, but they still fail to distinguish similar spectra across varying domains, in addition, the theoretical support is usually ignored. In this paper, we only rely on spectral information to solve the hyperspectral-monospectra problem, and propose a Convergence and Error-Constrained Conditional Domain Generalization method for Hyperspectral Imagery Classification (C$^3$DG). The major contributions of this paper include two aspects: the Conditional Revising Inference Block (CRIB), and the corresponding theories for model convergence and generalization errors. CRIB is the kernel structure of the proposed method, which employs a shared encoder and multi-branch decoders to fully leverage the conditional distribution during training, achieving a decoupling that aligns with the generation mechanisms of HSI. Moreover, to ensure model convergence and maintain controllable error, we propose the optimization convergence theorem and risk upper bound theorem. In the optimization convergence theorem, we ensure the model convergence by demonstrating that the gradients of the loss terms are not contradictory. In the risk upper bound theorem, our theoretical analysis explores the relationship between test-time training and recent related work to establish a concrete bound for error. Experimental results on three benchmark datasets indicate the superiority of C$^3$DG.
LGOct 25, 2023
Bayesian Domain Invariant Learning via Posterior Generalization of Parameter DistributionsShiyu Shen, Bin Pan, Tianyang Shi et al.
Domain invariant learning aims to learn models that extract invariant features over various training domains, resulting in better generalization to unseen target domains. Recently, Bayesian Neural Networks have achieved promising results in domain invariant learning, but most works concentrate on aligning features distributions rather than parameter distributions. Inspired by the principle of Bayesian Neural Network, we attempt to directly learn the domain invariant posterior distribution of network parameters. We first propose a theorem to show that the invariant posterior of parameters can be implicitly inferred by aggregating posteriors on different training domains. Our assumption is more relaxed and allows us to extract more domain invariant information. We also propose a simple yet effective method, named PosTerior Generalization (PTG), that can be used to estimate the invariant parameter distribution. PTG fully exploits variational inference to approximate parameter distributions, including the invariant posterior and the posteriors on training domains. Furthermore, we develop a lite version of PTG for widespread applications. PTG shows competitive performance on various domain generalization benchmarks on DomainBed. Additionally, PTG can use any existing domain generalization methods as its prior, and combined with previous state-of-the-art method the performance can be further improved. Code will be made public.
LGOct 19, 2023
Be Bayesian by Attachments to Catch More UncertaintyShiyu Shen, Bin Pan, Tianyang Shi et al.
Bayesian Neural Networks (BNNs) have become one of the promising approaches for uncertainty estimation due to the solid theorical foundations. However, the performance of BNNs is affected by the ability of catching uncertainty. Instead of only seeking the distribution of neural network weights by in-distribution (ID) data, in this paper, we propose a new Bayesian Neural Network with an Attached structure (ABNN) to catch more uncertainty from out-of-distribution (OOD) data. We first construct a mathematical description for the uncertainty of OOD data according to the prior distribution, and then develop an attached Bayesian structure to integrate the uncertainty of OOD data into the backbone network. ABNN is composed of an expectation module and several distribution modules. The expectation module is a backbone deep network which focuses on the original task, and the distribution modules are mini Bayesian structures which serve as attachments of the backbone. In particular, the distribution modules aim at extracting the uncertainty from both ID and OOD data. We further provide theoretical analysis for the convergence of ABNN, and experimentally validate its superiority by comparing with some state-of-the-art uncertainty estimation methods Code will be made available.
32.8CVApr 16
WILD-SAM: Phase-Aware Expert Adaptation of SAM for Landslide Detection in Wrapped InSAR InterferogramsYucheng Pan, Heping Li, Zhangle Liu et al.
Detecting slow-moving landslides directly from wrapped Interferometric Synthetic Aperture Radar (InSAR) interferograms is crucial for efficient geohazard monitoring, yet it remains fundamentally challenged by severe phase ambiguity and complex coherence noise. While the Segment Anything Model (SAM) offers a powerful foundation for segmentation, its direct transfer to wrapped phase data is hindered by a profound spectral domain shift, which suppresses the high-frequency fringes essential for boundary delineation. To bridge this gap, we propose WILD-SAM, a novel parameter-efficient fine-tuning framework specifically designed to adapt SAM for high-precision landslide detection on wrapped interferograms. Specifically, the architecture integrates a Phase-Aware Mixture-of-Experts (PA-MoE) Adapter into the frozen encoder to align spectral distributions and introduces a Wavelet-Guided Subband Enhancement (WGSE) strategy to generate frequency-aware dense prompts. The PA-MoE Adapter exploits a dynamic routing mechanism across heterogeneous convolutional experts to adaptively aggregate multi-scale spectral-textural priors, effectively aligning the distribution discrepancy between natural images and interferometric phase data. Meanwhile, the WGSE strategy leverages discrete wavelet transforms to explicitly disentangle high-frequency subbands and refine directional phase textures, injecting these structural cues as dense prompts to ensure topological integrity along sharp landslide boundaries. Extensive experiments on the ISSLIDE and ISSLIDE+ benchmarks demonstrate that WILD-SAM achieves state-of-the-art performance, significantly outperforming existing methods in both target completeness and contour fidelity.
IVMar 7, 2021Code
Feedback Refined Local-Global Network for Super-Resolution of Hyperspectral ImageryZhenjie Tang, Qing Xu, Zhenwei Shi et al.
With the development of deep learning technology, multi-spectral image super-resolution methods based on convolutional neural network have recently achieved great progress. However, the single hyperspectral image super-resolution remains a challenging problem due to the high-dimensional and complex spectral characteristics of hyperspectral data, which make it difficult to simultaneously capture spatial and spectral information. To deal with this issue, we propose a novel Feedback Refined Local-Global Network (FRLGN) for the super-resolution of hyperspectral image. To be specific, we develop a new Feedback Structure and a Local-Global Spectral Block to alleviate the difficulty in spatial and spectral feature extraction. The Feedback Structure can transfer the high-level information to guide the generation process of low-level feature, which is achieved by a recurrent structure with finite unfoldings. Furthermore, in order to effectively use the high-level information passed back, a Local-Global Spectral Block is constructed to handle the feedback connections. The Local-Global Spectral Block utilizes the feedback high-level information to correct the low-level feature from local spectral bands and generates powerful high-level representations among global spectral bands. By incorporating the Feedback Structure and Local-Global Spectral Block, the FRLGN can fully exploit spatial-spectral correlations among spectral bands and gradually reconstruct high-resolution hyperspectral images. The source code of FRLGN is available at https://github.com/tangzhenjie/FRLGN.
IVMay 27, 2025
Multitemporal Latent Dynamical Framework for Hyperspectral Images UnmixingRuiying Li, Bin Pan, Lan Ma et al.
Multitemporal hyperspectral unmixing can capture dynamical evolution of materials. Despite its capability, current methods emphasize variability of endmembers while neglecting dynamics of abundances, which motivates our adoption of neural ordinary differential equations to model abundances temporally. However, this motivation is hindered by two challenges: the inherent complexity in defining, modeling and solving problem, and the absence of theoretical support. To address above challenges, in this paper, we propose a multitemporal latent dynamical (MiLD) unmixing framework by capturing dynamical evolution of materials with theoretical validation. For addressing multitemporal hyperspectral unmixing, MiLD consists of problem definition, mathematical modeling, solution algorithm and theoretical support. We formulate multitemporal unmixing problem definition by conducting ordinary differential equations and developing latent variables. We transfer multitemporal unmixing to mathematical model by dynamical discretization approaches, which describe the discreteness of observed sequence images with mathematical expansions. We propose algorithm to solve problem and capture dynamics of materials, which approximates abundance evolution by neural networks. Furthermore, we provide theoretical support by validating the crucial properties, which verifies consistency, convergence and stability theorems. The major contributions of MiLD include defining problem by ordinary differential equations, modeling problem by dynamical discretization approach, solving problem by multitemporal unmixing algorithm, and presenting theoretical support. Our experiments on both synthetic and real datasets have validated the utility of our work
CVNov 28, 2025
Contrastive Heliophysical Image Pretraining for Solar Dynamics Observatory RecordsShiyu Shen, Zhe Gao, Taifeng Chai et al.
Deep learning has revolutionized solar image analysis, yet most approaches train task-specific encoders from scratch or rely on natural-image pretraining that ignores the unique characteristics of Solar Dynamics Observatory (SDO) data. We introduce SolarCHIP, a family of contrastively pretrained visual backbones tailored to multi-instrument SDO observations. SolarCHIP addresses three key challenges in solar imaging: multimodal sensing across AIA and HMI instruments, weak inter-class separability due to slow temporal evolution, and strong intra-class variability with sparse activity signals. Our pretraining framework employs a multi-granularity contrastive objective that jointly aligns (1) global class tokens across co-temporal AIA-HMI pairs to enhance temporal discrimination, (2) local patch tokens at fixed spatial indices to enforce position-consistent, modality-invariant features, and (3) intra-sample patches across different spatial locations to preserve fine-grained spatial structure. We train both CNN- and Vision Transformer-based autoencoders and demonstrate their effectiveness on two downstream tasks: cross-modal translation between HMI and AIA passbands via ControlNet, and full-disk flare classification. Experimental results show that SolarCHIP achieves state-of-the-art performance across both tasks, with particularly strong gains in low-resource settings where labeled data is limited. Ablation studies confirm that each contrastive component contributes essential discriminative capacity at different granularities. By publicly releasing pretrained weights and training code, we provide the heliophysics community with a practical, plug-and-play feature extractor that reduces computational requirements, improves label efficiency, and establishes a reusable foundation for diverse solar imaging applications.
IVNov 22, 2025
Spectral Super-Resolution Neural Operator with Atmospheric Radiative Transfer PriorZiye Zhang, Bin Pan, Zhenwei Shi
Spectral super-resolution (SSR) aims to reconstruct hyperspectral images (HSIs) from multispectral observations, with broad applications in remote sensing. Data-driven methods are widely used, but they often overlook physical principles, leading to unrealistic spectra, particularly in atmosphere-affected bands. To address this challenge, we propose the Spectral Super-Resolution Neural Operator (SSRNO), which incorporates atmospheric radiative transfer (ART) prior into the data-driven procedure, yielding more physically consistent predictions. The proposed SSRNO framework consists of three stages: upsampling, reconstruction, and refinement. In the upsampling stage, we leverage prior information to expand the input multispectral image, producing a physically plausible hyperspectral estimate. Subsequently, we utilize a neural operator in the reconstruction stage to learn a continuous mapping across the spectral domain. Finally, the refinement stage imposes a hard constraint on the output HSI to eliminate color distortion. The upsampling and refinement stages are implemented via the proposed guidance matrix projection (GMP) method, and the reconstruction neural operator adopts U-shaped spectral-aware convolution (SAC) layers to capture multi-scale features. Moreover, we theoretically demonstrate the optimality of the GMP method. With the neural operator and ART priors, SSRNO also achieves continuous spectral reconstruction and zero-shot extrapolation. Various experiments validate the effectiveness and generalization ability of the proposed approach.
CVSep 14, 2025
A Copula-Guided Temporal Dependency Method for Multitemporal Hyperspectral Images UnmixingRuiying Li, Bin Pan, Qiaoying Qu et al.
Multitemporal hyperspectral unmixing (MTHU) aims to model variable endmembers and dynamical abundances, which emphasizes the critical temporal information. However, existing methods have limitations in modeling temporal dependency, thus fail to capture the dynamical material evolution. Motivated by the ability of copula theory in modeling dependency structure explicitly, in this paper, we propose a copula-guided temporal dependency method (Cog-TD) for multitemporal hyperspectral unmixing. Cog-TD defines new mathematical model, constructs copula-guided framework and provides two key modules with theoretical support. The mathematical model provides explicit formulations for MTHU problem definition, which describes temporal dependency structure by incorporating copula theory. The copula-guided framework is constructed for utilizing copula function, which estimates dynamical endmembers and abundances with temporal dependency. The key modules consist of copula function estimation and temporal dependency guidance, which computes and employs temporal information to guide unmixing process. Moreover, the theoretical support demonstrates that estimated copula function is valid and the represented temporal dependency exists in hyperspectral images. The major contributions of this paper include redefining MTHU problem with temporal dependency, proposing a copula-guided framework, developing two key modules and providing theoretical support. Our experimental results on both synthetic and real-world datasets demonstrate the utility of the proposed method.
CVSep 14, 2025
SMILE: A Super-resolution Guided Multi-task Learning Method for Hyperspectral UnmixingRuiying Li, Bin Pan, Qiaoying Qu et al.
The performance of hyperspectral unmixing may be constrained by low spatial resolution, which can be enhanced using super-resolution in a multitask learning way. However, integrating super-resolution and unmixing directly may suffer two challenges: Task affinity is not verified, and the convergence of unmixing is not guaranteed. To address the above issues, in this paper, we provide theoretical analysis and propose super-resolution guided multi-task learning method for hyperspectral unmixing (SMILE). The provided theoretical analysis validates feasibility of multitask learning way and verifies task affinity, which consists of relationship and existence theorems by proving the positive guidance of super-resolution. The proposed framework generalizes positive information from super-resolution to unmixing by learning both shared and specific representations. Moreover, to guarantee the convergence, we provide the accessibility theorem by proving the optimal solution of unmixing. The major contributions of SMILE include providing progressive theoretical support, and designing a new framework for unmixing under the guidance of super-resolution. Our experiments on both synthetic and real datasets have substantiate the usefulness of our work.
CVAug 23, 2025
Preserving Domain Generalization in Fine-Tuning via Joint Parameter SelectionBin Pan, Shiyu Shen, Zongbin Wang et al.
Domain generalization seeks to develop models trained on a limited set of source domains that are capable of generalizing effectively to unseen target domains. While the predominant approach leverages large-scale pre-trained vision models as initialization, recent studies have highlighted that full fine-tuning can compromise the intrinsic generalization capabilities of these models. To address this limitation, parameter-efficient adaptation strategies have emerged, wherein only a subset of model parameters is selectively fine-tuned, thereby balancing task adaptation with the preservation of generalization. Motivated by this paradigm, we introduce Joint Parameter Selection (JPS), a novel method that restricts updates to a small, sparse subset of parameters, thereby retaining and harnessing the generalization strength of pre-trained models. Theoretically, we establish a generalization error bound that explicitly accounts for the sparsity of parameter updates, thereby providing a principled justification for selective fine-tuning. Practically, we design a selection mechanism employing dual operators to identify and update parameters exhibiting consistent and significant gradients across all source domains. Extensive benchmark experiments demonstrate that JPS achieves superior performance compared to state-of-the-art domain generalization methods, substantiating both the efficiency and efficacy of the proposed approach.
CVJun 3, 2025
Hyperspectral Image Generation with Unmixing Guided Diffusion ModelShiyu Shen, Bin Pan, Ziye Zhang et al.
We address hyperspectral image (HSI) synthesis, a problem that has garnered growing interest yet remains constrained by the conditional generative paradigms that limit sample diversity. While diffusion models have emerged as a state-of-the-art solution for high-fidelity image generation, their direct extension from RGB to hyperspectral domains is challenged by the high spectral dimensionality and strict physical constraints inherent to HSIs. To overcome the challenges, we introduce a diffusion framework explicitly guided by hyperspectral unmixing. The approach integrates two collaborative components: (i) an unmixing autoencoder that projects generation from the image domain into a low-dimensional abundance manifold, thereby reducing computational burden while maintaining spectral fidelity; and (ii) an abundance diffusion process that enforces non-negativity and sum-to-one constraints, ensuring physical consistency of the synthesized data. We further propose two evaluation metrics tailored to hyperspectral characteristics. Comprehensive experiments, assessed with both conventional measures and the proposed metrics, demonstrate that our method produces HSIs with both high quality and diversity, advancing the state of the art in hyperspectral data generation.
LGJun 3, 2025
A Pretrained Probabilistic Transformer for City-Scale Traffic Volume PredictionShiyu Shen, Bin Pan, Guirong Xue
City-scale traffic volume prediction plays a pivotal role in intelligent transportation systems, yet remains a challenge due to the inherent incompleteness and bias in observational data. Although deep learning-based methods have shown considerable promise, most existing approaches produce deterministic point estimates, thereby neglecting the uncertainty arising from unobserved traffic flows. Furthermore, current models are typically trained in a city-specific manner, which hinders their generalizability and limits scalability across diverse urban contexts. To overcome these limitations, we introduce TrafficPPT, a Pretrained Probabilistic Transformer designed to model traffic volume as a distributional aggregation of trajectories. Our framework fuses heterogeneous data sources-including real-time observations, historical trajectory data, and road network topology-enabling robust and uncertainty-aware traffic inference. TrafficPPT is initially pretrained on large-scale simulated data spanning multiple urban scenarios, and later fine-tuned on target cities to ensure effective domain adaptation. Experiments on real-world datasets show that TrafficPPT consistently surpasses state-of-the-art baselines, particularly under conditions of extreme data sparsity. Code will be open.
LGJun 9, 2024
Domain Generalization Guided by Large-Scale Pre-Trained PriorsZongbin Wang, Bin Pan, Shiyu Shen et al.
Domain generalization (DG) aims to train a model from limited source domains, allowing it to generalize to unknown target domains. Typically, DG models only employ large-scale pre-trained models during the initialization of fine-tuning. However, large-scale pre-trained models already possess the ability to resist domain shift. If we reference pre-trained models continuously during fine-tuning to maintain this ability, it could further enhance the generalization ability of the DG model. For this purpose, we introduce a new method called Fine-Tune with Large-scale pre-trained Priors (FT-LP), which incorporates the pre-trained model as a prior into the DG fine-tuning process, ensuring that the model refers to its pre-trained model at each optimization step. FT-LP comprises a theoretical framework and a simple implementation strategy. In theory, we verify the rationality of FT-LP by introducing a generalization error bound with the pre-trained priors for DG. In implementation, we utilize an encoder to simulate the model distribution, enabling the use of FT-LP when only pre-trained weights are available. In summary, we offer a new fine-tuning method for DG algorithms to utilize pre-trained models throughout the fine-tuning process. Through experiments on various datasets and DG models, our proposed method exhibits significant improvements, indicating its effectiveness.
LGJun 9, 2024
Domain Agnostic Conditional Invariant Predictions for Domain GeneralizationZongbin Wang, Bin Pan, Zhenwei Shi
Domain generalization aims to develop a model that can perform well on unseen target domains by learning from multiple source domains. However, recent-proposed domain generalization models usually rely on domain labels, which may not be available in many real-world scenarios. To address this challenge, we propose a Discriminant Risk Minimization (DRM) theory and the corresponding algorithm to capture the invariant features without domain labels. In DRM theory, we prove that reducing the discrepancy of prediction distribution between overall source domain and any subset of it can contribute to obtaining invariant features. To apply the DRM theory, we develop an algorithm which is composed of Bayesian inference and a new penalty termed as Categorical Discriminant Risk (CDR). In Bayesian inference, we transform the output of the model into a probability distribution to align with our theoretical assumptions. We adopt sliding update approach to approximate the overall prediction distribution of the model, which enables us to obtain CDR penalty. We also indicate the effectiveness of these components in finding invariant features. We evaluate our algorithm against various domain generalization methods on multiple real-world datasets, providing empirical support for our theory.