Jianwen Xiang

CR
h-index15
6papers
14citations
Novelty62%
AI Score51

6 Papers

CVMay 21
Distributed Image Compression with Multimodal Side Information at Extremely Low Bitrates

Guojun Xu, Mingyang Zhang, Jianwen Xiang et al.

Distributed Image Compression (DIC) is crucial for multi-view transmission, especially when operating at extremely low bitrates (< 0.1 bpp). Its core challenge is effectively utilizing side information to achieve high-quality reconstruction under strict bitrate budgets. However, existing DIC approaches struggle to exploit global context and object-level details from side information, leading to local blurring and the loss of fine details in the reconstruction. To address these limitations, we propose a Multimodal DIC framework (MDIC), which, for the first time, leverages side information in a multimodal manner into the DIC paradigm, effectively preserving fine-grained local details and enhancing global perceptual quality in reconstructed images. Specifically, we introduce a text-to-image diffusion-based decoder conditioned on textual side information extracted from correlated images to capture shared global semantics. Moreover, we design a feature-mask generator, supervised by a multimodal fine-grained alignment task, to strengthen the exploitation of visual side information. The generated mask serves two purposes: first, it guides the extraction of fine-grained details from losslessly transmitted side information to preserve the semantic consistency of reconstructed details; second, it regulates the extraction of clustered feature representations from the quantized VQ-VAE embeddings, compensating for category information lost under the extreme compression of the primary image. Extensive experiments on the widely used KITTI Stereo and Cityscapes datasets demonstrate that MDIC achieves state-of-the-art perceptual quality at extremely low bitrates.

CVNov 15, 2025
Known Meets Unknown: Mitigating Overconfidence in Open Set Recognition

Dongdong Zhao, Ranxin Fang, Changtian Song et al.

Open Set Recognition (OSR) requires models not only to accurately classify known classes but also to effectively reject unknown samples. However, when unknown samples are semantically similar to known classes, inter-class overlap in the feature space often causes models to assign unjustifiably high confidence to them, leading to misclassification as known classes -- a phenomenon known as overconfidence. This overconfidence undermines OSR by blurring the decision boundary between known and unknown classes. To address this issue, we propose a framework that explicitly mitigates overconfidence caused by inter-class overlap. The framework consists of two components: a perturbation-based uncertainty estimation module, which applies controllable parameter perturbations to generate diverse predictions and quantify predictive uncertainty, and an unknown detection module with distinct learning-based classifiers, implemented as a two-stage procedure, which leverages the estimated uncertainty to improve discrimination between known and unknown classes, thereby enhancing OSR performance. Experimental results on three public datasets show that the proposed framework achieves superior performance over existing OSR methods.

SEApr 21
Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning

Xinyao Zhang, Rui Wang, Jinhao Cui et al.

Multi-window mobile scenarios, such as split-screen and foldable modes, make GUI display defects more likely by forcing applications to adapt to changing window sizes and dynamic layout reflow. Existing detection techniques are limited in two ways: they are largely passive, analyzing screenshots only after problematic states have been reached, and they are mainly designed for conventional full-screen interfaces, making them less effective in multi-window settings.We propose an end-to-end framework for GUI display defect detection in multi-window mobile scenarios. The framework proactively triggers split-screen, foldable, and window-transition states during app exploration, uses Set-of-Mark (SoM) to align screenshots with widget-level interface elements, and leverages multimodal large language models with chain-of-thought prompting to detect, localize, and explain display defects. We also construct a benchmark of GUI display defects using 50 real-world Android applications.Experimental results show that multi-window settings substantially increase the exposure of layout-related defects, with text truncation increasing by 184% compared with conventional full-screen settings. At the application level, our method detects 40 defect-prone apps with a false positive rate of 10.00% and a false negative rate of 11.11%, outperforming OwlEye and YOLO-based baselines. At the fine-grained level, it achieves the best F1 score of 87.2% for widget occlusion detection.

CROct 15, 2025
Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning

Baogang Song, Dongdong Zhao, Jianwen Xiang et al.

Backdoor attacks pose a persistent security risk to deep neural networks (DNNs) due to their stealth and durability. While recent research has explored leveraging model unlearning mechanisms to enhance backdoor concealment, existing attack strategies still leave persistent traces that may be detected through static analysis. In this work, we introduce the first paradigm of revocable backdoor attacks, where the backdoor can be proactively and thoroughly removed after the attack objective is achieved. We formulate the trigger optimization in revocable backdoor attacks as a bilevel optimization problem: by simulating both backdoor injection and unlearning processes, the trigger generator is optimized to achieve a high attack success rate (ASR) while ensuring that the backdoor can be easily erased through unlearning. To mitigate the optimization conflict between injection and removal objectives, we employ a deterministic partition of poisoning and unlearning samples to reduce sampling-induced variance, and further apply the Projected Conflicting Gradient (PCGrad) technique to resolve the remaining gradient conflicts. Experiments on CIFAR-10 and ImageNet demonstrate that our method maintains ASR comparable to state-of-the-art backdoor attacks, while enabling effective removal of backdoor behavior after unlearning. This work opens a new direction for backdoor attack research and presents new challenges for the security of machine learning systems.

CRMar 10, 2021
NegDL: Privacy-Preserving Deep Learning Based on Negative Database

Dongdong Zhao, Pingchuan Zhang, Jianwen Xiang et al.

In the era of big data, deep learning has become an increasingly popular topic. It has outstanding achievements in the fields of image recognition, object detection, and natural language processing et al. The first priority of deep learning is exploiting valuable information from a large amount of data, which will inevitably induce privacy issues that are worthy of attention. Presently, several privacy-preserving deep learning methods have been proposed, but most of them suffer from a non-negligible degradation of either efficiency or accuracy. Negative database (\textit{NDB}) is a new type of data representation which can protect data privacy by storing and utilizing the complementary form of original data. In this paper, we propose a privacy-preserving deep learning method named NegDL based on \textit{NDB}. Specifically, private data are first converted to \textit{NDB} as the input of deep learning models by a generation algorithm called \textit{QK}-hidden algorithm, and then the sketches of \textit{NDB} are extracted for training and inference. We demonstrate that the computational complexity of NegDL is the same as the original deep learning model without privacy protection. Experimental results on Breast Cancer, MNIST, and CIFAR-10 benchmark datasets demonstrate that the accuracy of NegDL could be comparable to the original deep learning model in most cases, and it performs better than the method based on differential privacy.

CLApr 27, 2019
Using Context Information to Enhance Simple Question Answering

Lin Li, Mengjing Zhang, Zhaohui Chao et al.

With the rapid development of knowledge bases(KBs),question answering(QA)based on KBs has become a hot research issue. In this paper,we propose two frameworks(i.e.,pipeline framework,an end-to-end framework)to focus answering single-relation factoid question. In both of two frameworks,we study the effect of context information on the quality of QA,such as the entity's notable type,out-degree. In the end-to-end framework,we combine char-level encoding and self-attention mechanisms,using weight sharing and multi-task strategies to enhance the accuracy of QA. Experimental results show that context information can get better results of simple QA whether it is the pipeline framework or the end-to-end framework. In addition,we find that the end-to-end framework achieves results competitive with state-of-the-art approaches in terms of accuracy and take much shorter time than them.