Qianlong Xiang

CV
4papers
44citations
Novelty57%
AI Score52

4 Papers

87.4CVApr 16Code
NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results

Andrey Moskalenko, Alexey Bryncev, Ivan Kosmynin et al.

This paper presents an overview of the NTIRE 2026 Challenge on Video Saliency Prediction. The goal of the challenge participants was to develop automatic saliency map prediction methods for the provided video sequences. The novel dataset of 2,000 diverse videos with an open license was prepared for this challenge. The fixations and corresponding saliency maps were collected using crowdsourced mouse tracking and contain viewing data from over 5,000 assessors. Evaluation was performed on a subset of 800 test videos using generally accepted quality metrics. The challenge attracted over 20 teams making submissions, and 7 teams passed the final phase with code review. All data used in this challenge is made publicly available - https://github.com/msu-video-group/NTIRE26_Saliency_Prediction.

CVSep 5, 2024
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

Qianlong Xiang, Miao Zhang, Yuzhang Shang et al.

Diffusion models (DMs) have demonstrated exceptional generative capabilities across various domains, including image, video, and so on. A key factor contributing to their effectiveness is the high quantity and quality of data used during training. However, mainstream DMs now consume increasingly large amounts of data. For example, training a Stable Diffusion model requires billions of image-text pairs. This enormous data requirement poses significant challenges for training large DMs due to high data acquisition costs and storage expenses. To alleviate this data burden, we propose a novel scenario: using existing DMs as data sources to train new DMs with any architecture. We refer to this scenario as Data-Free Knowledge Distillation for Diffusion Models (DKDM), where the generative ability of DMs is transferred to new ones in a data-free manner. To tackle this challenge, we make two main contributions. First, we introduce a DKDM objective that enables the training of new DMs via distillation, without requiring access to the data. Second, we develop a dynamic iterative distillation method that efficiently extracts time-domain knowledge from existing DMs, enabling direct retrieval of training data without the need for a prolonged generative process. To the best of our knowledge, we are the first to explore this scenario. Experimental results demonstrate that our data-free approach not only achieves competitive generative performance but also, in some instances, outperforms models trained with the entire dataset.

44.0CVApr 9Code
ViSAGE @ NTIRE 2026 Challenge on Video Saliency Prediction

Kun Wang, Yupeng Hu, Zhiran Li et al.

In this report, we present our champion solution for the NTIRE 2026 Challenge on Video Saliency Prediction held in conjunction with CVPR 2026. To exploit complementary inductive biases for video saliency, we propose Video Saliency with Adaptive Gated Experts (ViSAGE), a multi-expert ensemble framework. Each specialized decoder performs adaptive gating and modulation to refine spatio-temporal features. The complementary predictions from different experts are then fused at inference. ViSAGE thereby aggregates diverse inductive biases to capture complex spatio-temporal saliency cues in videos. On the Private Test set, ViSAGE ranked first on two out of four evaluation metrics, and outperformed most competing solutions on the other two metrics, demonstrating its effectiveness and generalization ability. Our code has been released at https://github.com/iLearn-Lab/CVPRW26-ViSAGE.

77.4CVMar 18
TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Qianlong Xiang, Miao Zhang, Haoyu Zhang et al.

Although text-to-image diffusion models exhibit remarkable generative power, concept erasure techniques are essential for their safe deployment to prevent the creation of harmful content. This has fostered a dynamic interplay between the development of erasure defenses and the adversarial probes designed to bypass them, and this co-evolution has progressively enhanced the efficacy of erasure methods. However, this adversarial co-evolution has converged on a narrow, text-centric paradigm that equates erasure with severing the text-to-image mapping, ignoring that the underlying visual knowledge related to undesired concepts still persist. To substantiate this claim, we investigate from a visual perspective, leveraging DDIM inversion to probe whether a generative pathway for the erased concept can still be found. However, identifying such a visual generative pathway is challenging because standard text-guided DDIM inversion is actively resisted by text-centric defenses within the erased model. To address this, we introduce TINA, a novel Text-free INversion Attack, which enforces this visual-only probe by operating under a null-text condition, thereby avoiding existing text-centric defenses. Moreover, TINA integrates an optimization procedure to overcome the accumulating approximation errors that arise when standard inversion operates without its usual textual guidance. Our experiments demonstrate that TINA regenerates erased concepts from models treated with state-of-the-art unlearning. The success of TINA proves that current methods merely obscure concepts, highlighting an urgent need for paradigms that operate directly on internal visual knowledge.