77.9CVApr 16
The Fourth Challenge on Image Super-Resolution ($\times$4) at NTIRE 2026: Benchmark Results and Method OverviewZheng Chen, Kai Liu, Jingkai Wang et al.
This paper presents the NTIRE 2026 image super-resolution ($\times$4) challenge, one of the associated competitions of the NTIRE 2026 Workshop at CVPR 2026. The challenge aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective super-resolution solutions and analyze recent advances in the field. To reflect the evolving objectives of image super-resolution, the challenge includes two tracks: (1) a restoration track, which emphasizes pixel-wise fidelity and ranks submissions based on PSNR; and (2) a perceptual track, which focuses on visual realism and evaluates results using a perceptual score. A total of 194 participants registered for the challenge, with 31 teams submitting valid entries. This report summarizes the challenge design, datasets, evaluation protocol, main results, and methods of participating teams. The challenge provides a unified benchmark and offers insights into current progress and future directions in image super-resolution.
30.6CVApr 13Code
Degradation-Aware and Structure-Preserving Diffusion for Real-World Image Super-ResolutionYang Ji, Zonghao Chen, Zhihao Xue et al.
Real-world image super-resolution is particularly challenging for diffusion models because real degradations are complex, heterogeneous, and rarely modeled explicitly. We propose a degradation-aware and structure-preserving diffusion framework for real-world SR. Specifically, we introduce Degradation-aware Token Injection, which encodes lightweight degradation statistics from low-resolution inputs and fuses them with semantic conditioning features, enabling explicit degradation-aware restoration. We further propose Spatially Asymmetric Noise Injection, which modulates diffusion noise with local edge strength to better preserve structural regions during training. Both modules are lightweight add-ons to the adopted diffusion SR framework, requiring only minor modifications to the conditioning pipeline. Experiments on DIV2K and RealSR show that our method delivers competitive no-reference perceptual quality and visually more realistic restoration results than recent baselines, while maintaining a favorable perception--distortion trade-off. Ablations confirm the effectiveness of each module and their complementary gains when combined. The code and model are publicly available at https://github.com/jiyang0315/DASP-SR.git.
LGAug 21, 2023
centroIDA: Cross-Domain Class Discrepancy Minimization Based on Accumulative Class-Centroids for Imbalanced Domain AdaptationXiaona Sun, Zhenyu Wu, Yichen Liu et al.
Unsupervised Domain Adaptation (UDA) approaches address the covariate shift problem by minimizing the distribution discrepancy between the source and target domains, assuming that the label distribution is invariant across domains. However, in the imbalanced domain adaptation (IDA) scenario, covariate and long-tailed label shifts both exist across domains. To tackle the IDA problem, some current research focus on minimizing the distribution discrepancies of each corresponding class between source and target domains. Such methods rely much on the reliable pseudo labels' selection and the feature distributions estimation for target domain, and the minority classes with limited numbers makes the estimations more uncertainty, which influences the model's performance. In this paper, we propose a cross-domain class discrepancy minimization method based on accumulative class-centroids for IDA (centroIDA). Firstly, class-based re-sampling strategy is used to obtain an unbiased classifier on source domain. Secondly, the accumulative class-centroids alignment loss is proposed for iterative class-centroids alignment across domains. Finally, class-wise feature alignment loss is used to optimize the feature representation for a robust classification boundary. A series of experiments have proved that our method outperforms other SOTA methods on IDA problem, especially with the increasing degree of label shift.
LGAug 16, 2023
Dual-Branch Temperature Scaling Calibration for Long-Tailed RecognitionJialin Guo, Zhenyu Wu, Zhiqiang Zhan et al.
The calibration for deep neural networks is currently receiving widespread attention and research. Miscalibration usually leads to overconfidence of the model. While, under the condition of long-tailed distribution of data, the problem of miscalibration is more prominent due to the different confidence levels of samples in minority and majority categories, and it will result in more serious overconfidence. To address this problem, some current research have designed diverse temperature coefficients for different categories based on temperature scaling (TS) method. However, in the case of rare samples in minority classes, the temperature coefficient is not generalizable, and there is a large difference between the temperature coefficients of the training set and the validation set. To solve this challenge, this paper proposes a dual-branch temperature scaling calibration model (Dual-TS), which considers the diversities in temperature parameters of different categories and the non-generalizability of temperature parameters for rare samples in minority classes simultaneously. Moreover, we noticed that the traditional calibration evaluation metric, Excepted Calibration Error (ECE), gives a higher weight to low-confidence samples in the minority classes, which leads to inaccurate evaluation of model calibration. Therefore, we also propose Equal Sample Bin Excepted Calibration Error (Esbin-ECE) as a new calibration evaluation metric. Through experiments, we demonstrate that our model yields state-of-the-art in both traditional ECE and Esbin-ECE metrics.
LGJan 26, 2025Code
A Comprehensive Survey on Self-Interpretable Neural NetworksYang Ji, Ying Sun, Yuting Zhang et al.
Neural networks have achieved remarkable success across various fields. However, the lack of interpretability limits their practical use, particularly in critical decision-making scenarios. Post-hoc interpretability, which provides explanations for pre-trained models, is often at risk of robustness and fidelity. This has inspired a rising interest in self-interpretable neural networks, which inherently reveal the prediction rationale through the model structures. Although there exist surveys on post-hoc interpretability, a comprehensive and systematic survey of self-interpretable neural networks is still missing. To address this gap, we first collect and review existing works on self-interpretable neural networks and provide a structured summary of their methodologies from five key perspectives: attribution-based, function-based, concept-based, prototype-based, and rule-based self-interpretation. We also present concrete, visualized examples of model explanations and discuss their applicability across diverse scenarios, including image, text, graph data, and deep reinforcement learning. Additionally, we summarize existing evaluation metrics for self-interpretability and identify open challenges in this field, offering insights for future research. To support ongoing developments, we present a publicly accessible resource to track advancements in this domain: https://github.com/yangji721/Awesome-Self-Interpretable-Neural-Network.
CVMar 2
LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous DrivingYuechen Luo, Fang Li, Shaoqing Xu et al.
While Vision-Language-Action (VLA) models have revolutionized autonomous driving by unifying perception and planning, their reliance on explicit textual Chain-of-Thought (CoT) leads to semantic-perceptual decoupling and perceptual-symbolic conflicts. Recent shifts toward latent reasoning attempt to bypass these bottlenecks by thinking in continuous hidden space. However, without explicit intermediate constraints, standard latent CoT often operates as a physics-agnostic representation. To address this, we propose the Latent Spatio-Temporal VLA (LaST-VLA), a framework shifting the reasoning paradigm from discrete symbolic processing into a physically grounded Latent Spatio-Temporal CoT. By implementing a dual-feature alignment mechanism, we distill geometric constraints from 3D foundation models and dynamic foresight from world models directly into the latent space. Coupled with a progressive SFT training strategy that transitions from feature alignment to trajectory generation, and refined via Reinforcement Learning with Group Relative Policy Optimization (GRPO) to ensure safety and rule compliance. \method~setting a new record on NAVSIM v1 (91.3 PDMS) and NAVSIM v2 (87.1 EPDMS), while excelling in spatial-temporal reasoning on SURDS and NuDynamics benchmarks.
CVMar 9
BrainCast: A Spatio-Temporal Forecasting Model for Whole-Brain fMRI Time Series PredictionYunlong Gao, Jinbo Yang, Li Xiao et al.
Functional magnetic resonance imaging (fMRI) enables noninvasive investigation of brain function, while short clinical scan durations, arising from human and non-human factors, usually lead to reduced data quality and limited statistical power for neuroimaging research. In this paper, we propose BrainCast, a novel spatio-temporal forecasting framework specifically tailored for whole-brain fMRI time series forecasting, to extend informative fMRI time series without additional data acquisition. It formulates fMRI time series forecasting as a multivariate time series prediction task and jointly models temporal dynamics within regions of interest (ROIs) and spatial interactions across ROIs. Specifically, BrainCast integrates a Spatial Interaction Awareness module to characterize inter-ROI dependencies via embedding every ROI time series as a token, a Temporal Feature Refinement module to capture intrinsic neural dynamics within each ROI by enhancing both low- and high-energy temporal components of fMRI time series at the ROI level, and a Spatio-temporal Pattern Alignment module to combine spatial and temporal representations for producing informative whole-brain features. Experimental results on resting-state and task fMRI datasets from the Human Connectome Project demonstrate the superiority of BrainCast over state-of-the-art time series forecasting baselines. Moreover, fMRI time series extended by BrainCast improve downstream cognitive ability prediction, highlighting the clinical and neuroscientific impact brought by whole-brain fMRI time series forecasting in scenarios with restricted scan durations.
CVDec 29, 2024Code
Contrastive Conditional Alignment based on Label Shift Calibration for Imbalanced Domain AdaptationXiaona Sun, Zhenyu Wu, Zhiqiang Zhan et al.
Many existing unsupervised domain adaptation (UDA) methods primarily focus on covariate shift, limiting their effectiveness in imbalanced domain adaptation (IDA) where both covariate shift and label shift coexist. Recent IDA methods have achieved promising results based on self-training using target pseudo labels. However, under the IDA scenarios, the classifier learned in the source domain will exhibit different decision bias from the target domain. It will potentially make target pseudo labels unreliable, and will further lead to error accumulation with incorrect class alignment. Thus, we propose contrastive conditional alignment based on label shift calibration (CCA-LSC) for IDA, to address both covariate shift and label shift. Initially, our contrastive conditional alignment resolve covariate shift to learn representations with domain invariance and class discriminability, which include domain adversarial learning, sample-weighted moving average centroid alignment and discriminative feature alignment. Subsequently, we estimate the probability distribution of the target domain, and calibrate target sample classification predictions based on label shift metrics to encourage labeling pseudo-labels more consistently with the distribution of real target data. Extensive experiments are conducted and demonstrate that our method outperforms existing UDA and IDA methods on benchmarks with both label shift and covariate shift. Our code is available at https://github.com/ysxcj-hub/CCA-LSC.
38.2CVApr 13
LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool for Pre-trained Model-based Class-Incremental LearningLinjie Li, Zhenyu Wu, Huiyu Xiao et al.
Prompt-based class-incremental learning methods typically construct a prompt pool consisting of multiple trainable key-prompts and perform instance-level matching to select the most suitable prompt embeddings, which has shown promising results. However, existing approaches face several limitations, including fixed prompt pools, manual selection of prompt embeddings, and strong reliance on the pretrained backbone for prompt selection. To address these issues, we propose a \textbf{L}ayer-importance guided \textbf{D}ual \textbf{E}xpandable \textbf{P}rompt Pool (\textbf{LDEPrompt}), which enables adaptive layer selection as well as dynamic freezing and expansion of the prompt pool. Extensive experiments on widely used class-incremental learning benchmarks demonstrate that LDEPrompt achieves state-of-the-art performance, validating its effectiveness and scalability.
77.2LGApr 13
Quantum-Gated Task-interaction Knowledge Distillation for Pre-trained Model-based Class-Incremental LearningLinjie Li, Huiyu Xiao, Jiarui Cao et al.
Class-incremental learning (CIL) aims to continuously accumulate knowledge from a stream of tasks and construct a unified classifier over all seen classes. Although pretrained models (PTMs) have shown promising performance in CIL, they still struggle with the entanglement of multi-task subspaces, leading to catastrophic forgetting when task routing parameters are poorly calibrated or task-level representations are rigidly fixed. To address this issue, we propose a novel Quantum-Gated Task-interaction Knowledge Distillation (QKD) framework that leverages quantum gating to guide inter-task knowledge transfer. Specifically, we introduce a quantum-gated task modulation gating mechanism to model the relational dependencies among task embedding, dynamically capturing the sample-to-task relevance for both joint training and inference across streaming tasks. Guided by the quantum gating outputs, we perform task-interaction knowledge distillation guided by these task-embedding-level correlation weights from old to new adapters, enabling the model to bridge the representation gaps between independent task subspaces. Extensive experiments demonstrate that QKD effectively mitigates forgetting and achieves state-of-the-art performance.
LGMay 21, 2025Code
MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental LearningLinjie Li, Zhenyu Wu, Yang Ji
Class-incremental learning (CIL) requires deep learning models to continuously acquire new knowledge from streaming data while preserving previously learned information. Recently, CIL based on pre-trained models (PTMs) has achieved remarkable success. However, prompt-based approaches suffer from prompt overwriting, while adapter-based methods face challenges such as dimensional misalignment between tasks. While the idea of expert fusion in Mixture of Experts (MoE) can help address dimensional inconsistency, both expert and routing parameters are prone to being overwritten in dynamic environments, making MoE challenging to apply directly in CIL. To tackle these issues, we propose a mixture of task-specific experts (MoTE) framework that effectively mitigates the miscalibration caused by inconsistent output dimensions across tasks. Inspired by the weighted feature fusion and sparse activation mechanisms in MoE, we introduce task-aware expert filtering and reliable expert joint inference during the inference phase, mimicking the behavior of routing layers without inducing catastrophic forgetting. Extensive experiments demonstrate the superiority of our method without requiring an exemplar set. Furthermore, the number of tasks in MoTE scales linearly with the number of adapters. Building on this, we further explore the trade-off between adapter expansion and model performance and propose the Adapter-Limited MoTE. The code is available at https://github.com/Franklilinjie/MoTE.
LGMar 13, 2024
Decoupled Federated Learning on Long-Tailed and Non-IID data with Feature StatisticsZhuoxin Chen, Zhenyu Wu, Yang Ji
Federated learning is designed to enhance data security and privacy, but faces challenges when dealing with heterogeneous data in long-tailed and non-IID distributions. This paper explores an overlooked scenario where tail classes are sparsely distributed over a few clients, causing the models trained with these classes to have a lower probability of being selected during client aggregation, leading to slower convergence rates and poorer model performance. To address this issue, we propose a two-stage Decoupled Federated learning framework using Feature Statistics (DFL-FS). In the first stage, the server estimates the client's class coverage distributions through masked local feature statistics clustering to select models for aggregation to accelerate convergence and enhance feature learning without privacy leakage. In the second stage, DFL-FS employs federated feature regeneration based on global feature statistics and utilizes resampling and weighted covariance to calibrate the global classifier to enhance the model's adaptability to long-tailed data distributions. We conducted experiments on CIFAR10-LT and CIFAR100-LT datasets with various long-tailed rates. The results demonstrate that our method outperforms state-of-the-art methods in both accuracy and convergence rate.
LGMar 17, 2025
Enhancing Job Salary Prediction with Disentangled Composition Effect Modeling: A Neural Prototyping ApproachYang Ji, Ying Sun, Hengshu Zhu
In the era of the knowledge economy, understanding how job skills influence salary is crucial for promoting recruitment with competitive salary systems and aligned salary expectations. Despite efforts on salary prediction based on job positions and talent demographics, there still lacks methods to effectively discern the set-structured skills' intricate composition effect on job salary. While recent advances in neural networks have significantly improved accurate set-based quantitative modeling, their lack of explainability hinders obtaining insights into the skills' composition effects. Indeed, model explanation for set data is challenging due to the combinatorial nature, rich semantics, and unique format. To this end, in this paper, we propose a novel intrinsically explainable set-based neural prototyping approach, namely \textbf{LGDESetNet}, for explainable salary prediction that can reveal disentangled skill sets that impact salary from both local and global perspectives. Specifically, we propose a skill graph-enhanced disentangled discrete subset selection layer to identify multi-faceted influential input subsets with varied semantics. Furthermore, we propose a set-oriented prototype learning method to extract globally influential prototypical sets. The resulting output is transparently derived from the semantic interplay between these input subsets and global prototypes. Extensive experiments on four real-world datasets demonstrate that our method achieves superior performance than state-of-the-art baselines in salary prediction while providing explainable insights into salary-influencing patterns.
CVNov 25, 2025
LiMT: A Multi-task Liver Image Benchmark DatasetZhe Liu, Kai Han, Siqi Ma et al.
Computer-aided diagnosis (CAD) technology can assist clinicians in evaluating liver lesions and intervening with treatment in time. Although CAD technology has advanced in recent years, the application scope of existing datasets remains relatively limited, typically supporting only single tasks, which has somewhat constrained the development of CAD technology. To address the above limitation, in this paper, we construct a multi-task liver dataset (LiMT) used for liver and tumor segmentation, multi-label lesion classification, and lesion detection based on arterial phase-enhanced computed tomography (CT), potentially providing an exploratory solution that is able to explore the correlation between tasks and does not need to worry about the heterogeneity between task-specific datasets during training. The dataset includes CT volumes from 150 different cases, comprising four types of liver diseases as well as normal cases. Each volume has been carefully annotated and calibrated by experienced clinicians. This public multi-task dataset may become a valuable resource for the medical imaging research community in the future. In addition, this paper not only provides relevant baseline experimental results but also reviews existing datasets and methods related to liver-related tasks. Our dataset is available at https://drive.google.com/drive/folders/1l9HRK13uaOQTNShf5pwgSz3OTanWjkag?usp=sharing.
CLSep 20, 2025
Semi-Supervised Synthetic Data Generation with Fine-Grained Relevance Control for Short Video Search Relevance ModelingHaoran Li, Zhiming Su, Junyan Yao et al.
Synthetic data is widely adopted in embedding models to ensure diversity in training data distributions across dimensions such as difficulty, length, and language. However, existing prompt-based synthesis methods struggle to capture domain-specific data distributions, particularly in data-scarce domains, and often overlook fine-grained relevance diversity. In this paper, we present a Chinese short video dataset with 4-level relevance annotations, filling a critical resource void. Further, we propose a semi-supervised synthetic data pipeline where two collaboratively trained models generate domain-adaptive short video data with controllable relevance labels. Our method enhances relevance-level diversity by synthesizing samples for underrepresented intermediate relevance labels, resulting in a more balanced and semantically rich training data set. Extensive offline experiments show that the embedding model trained on our synthesized data outperforms those using data generated based on prompting or vanilla supervised fine-tuning(SFT). Moreover, we demonstrate that incorporating more diverse fine-grained relevance levels in training data enhances the model's sensitivity to subtle semantic distinctions, highlighting the value of fine-grained relevance supervision in embedding learning. In the search enhanced recommendation pipeline of Douyin's dual-column scenario, through online A/B testing, the proposed model increased click-through rate(CTR) by 1.45%, raised the proportion of Strong Relevance Ratio (SRR) by 4.9%, and improved the Image User Penetration Rate (IUPR) by 0.1054%.
CVFeb 8, 2024
TaE: Task-aware Expandable Representation for Long Tail Class Incremental LearningLinjie Li, Zhenyu Wu, Jiaming Liu et al.
Class-incremental learning is dedicated to the development of deep learning models that are capable of acquiring new knowledge while retaining previously learned information. Most methods focus on balanced data distribution for each task, overlooking real-world long-tailed distributions. Therefore, Long-Tailed Class-Incremental Learning has been introduced, which trains on data where head classes have more samples than tail classes. Existing methods mainly focus on preserving representative samples from previous classes to combat catastrophic forgetting. Recently, dynamic network algorithms freeze old network structures and expand new ones, achieving significant performance. However, with the introduction of the long-tail problem, merely extending Determined blocks can lead to miscalibrated predictions, while expanding the entire backbone results in an explosion of memory size. To address these issues, we introduce a novel Task-aware Expandable (TaE) framework, dynamically allocating and updating task-specific trainable parameters to learn diverse representations from each incremental task while resisting forgetting through the majority of frozen model parameters. To further encourage the class-specific feature representation, we develop a Centroid-Enhanced (CEd) method to guide the update of these task-aware parameters. This approach is designed to adaptively allocate feature space for every class by adjusting the distance between intra- and inter-class features, which can extend to all "training from sketch" algorithms. Extensive experiments demonstrate that TaE achieves state-of-the-art performance.
AIAug 10, 2021
Multi-Factors Aware Dual-Attentional Knowledge TracingMoyu Zhang, Xinning Zhu, Chunhong Zhang et al.
With the increasing demands of personalized learning, knowledge tracing has become important which traces students' knowledge states based on their historical practices. Factor analysis methods mainly use two kinds of factors which are separately related to students and questions to model students' knowledge states. These methods use the total number of attempts of students to model students' learning progress and hardly highlight the impact of the most recent relevant practices. Besides, current factor analysis methods ignore rich information contained in questions. In this paper, we propose Multi-Factors Aware Dual-Attentional model (MF-DAKT) which enriches question representations and utilizes multiple factors to model students' learning progress based on a dual-attentional mechanism. More specifically, we propose a novel student-related factor which records the most recent attempts on relevant concepts of students to highlight the impact of recent exercises. To enrich questions representations, we use a pre-training method to incorporate two kinds of question information including questions' relation and difficulty level. We also add a regularization term about questions' difficulty level to restrict pre-trained question representations to fine-tuning during the process of predicting students' performance. Moreover, we apply a dual-attentional mechanism to differentiate contributions of factors and factor interactions to final prediction in different practice records. At last, we conduct experiments on several real-world datasets and results show that MF-DAKT can outperform existing knowledge tracing methods. We also conduct several studies to validate the effects of each component of MF-DAKT.
LGNov 19, 2018
An Adaptive Oversampling Learning Method for Class-Imbalanced Fault Diagnostics and PrognosticsWenfang Lin, Zhenyu Wu, Yang Ji
Data-driven fault diagnostics and prognostics suffers from class-imbalance problem in industrial systems and it raises challenges to common machine learning algorithms as it becomes difficult to learn the features of the minority class samples. Synthetic oversampling methods are commonly used to tackle these problems by generating the minority class samples to balance the distributions between majority and minority classes. However, many of oversampling methods are inappropriate that they cannot generate effective and useful minority class samples according to different distributions of data, which further complicate the process of learning samples. Thus, this paper proposes a novel adaptive oversampling technique: EM-based Weighted Minority Oversampling TEchnique (EWMOTE) for industrial fault diagnostics and prognostics. The methods comprises a weighted minority sampling strategy to identify hard-to-learn informative minority fault samples and Expectation Maximization (EM) based imputation algorithm to generate fault samples. To validate the performance of the proposed methods, experiments are conducted in two real datasets. The results show that the method could achieve better performance on not only binary class, but multi-class imbalance learning task in different imbalance ratios than other oversampling-based baseline models.