IVMar 8, 2022
Abandoning the Bayer-Filter to See in the DarkXingbo Dong, Wanyan Xu, Zhihui Miao et al.
Low-light image enhancement - a pervasive but challenging problem, plays a central role in enhancing the visibility of an image captured in a poor illumination environment. Due to the fact that not all photons can pass the Bayer-Filter on the sensor of the color camera, in this work, we first present a De-Bayer-Filter simulator based on deep neural networks to generate a monochrome raw image from the colored raw image. Next, a fully convolutional network is proposed to achieve the low-light image enhancement by fusing colored raw data with synthesized monochrome raw data. Channel-wise attention is also introduced to the fusion process to establish a complementary interaction between features from colored and monochrome raw images. To train the convolutional networks, we propose a dataset with monochrome and color raw pairs named Mono-Colored Raw paired dataset (MCR) collected by using a monochrome camera without Bayer-Filter and a color camera with Bayer-Filter. The proposed pipeline take advantages of the fusion of the virtual monochrome and the color raw images and our extensive experiments indicate that significant improvement can be achieved by leveraging raw sensor data and data-driven learning.
CVJun 9, 2022
Reconstruct Face from Features Using GAN Generator as a Distribution ConstraintXingbo Dong, Zhihui Miao, Lan Ma et al.
Face recognition based on the deep convolutional neural networks (CNN) shows superior accuracy performance attributed to the high discriminative features extracted. Yet, the security and privacy of the extracted features from deep learning models (deep features) have been often overlooked. This paper proposes the reconstruction of face images from deep features without accessing the CNN network configurations as a constrained optimization problem. Such optimization minimizes the distance between the features extracted from the original face image and the reconstructed face image. Instead of directly solving the optimization problem in the image space, we innovatively reformulate the problem by looking for a latent vector of a GAN generator, then use it to generate the face image. The GAN generator serves as a dual role in this novel framework, i.e., face distribution constraint of the optimization goal and a face generator. On top of the novel optimization task, we also propose an attack pipeline to impersonate the target user based on the generated face image. Our results show that the generated face images can achieve a state-of-the-art successful attack rate of 98.0\% on LFW under type-I attack @ FAR of 0.1\%. Our work sheds light on the biometric deployment to meet the privacy-preserving and security policies.
CVNov 11, 2025Code
Auto-US: An Ultrasound Video Diagnosis Agent Using Video Classification Framework and LLMsYuezhe Yang, Yiyue Guo, Wenjie Cai et al.
AI-assisted ultrasound video diagnosis presents new opportunities to enhance the efficiency and accuracy of medical imaging analysis. However, existing research remains limited in terms of dataset diversity, diagnostic performance, and clinical applicability. In this study, we propose \textbf{Auto-US}, an intelligent diagnosis agent that integrates ultrasound video data with clinical diagnostic text. To support this, we constructed \textbf{CUV Dataset} of 495 ultrasound videos spanning five categories and three organs, aggregated from multiple open-access sources. We developed \textbf{CTU-Net}, which achieves state-of-the-art performance in ultrasound video classification, reaching an accuracy of 86.73\% Furthermore, by incorporating large language models, Auto-US is capable of generating clinically meaningful diagnostic suggestions. The final diagnostic scores for each case exceeded 3 out of 5 and were validated by professional clinicians. These results demonstrate the effectiveness and clinical potential of Auto-US in real-world ultrasound applications. Code and data are available at: https://github.com/Bean-Young/Auto-US.
CVMar 2, 2022
A Generalized Approach for Cancellable Template and Its Realization for Minutia Cylinder-CodeXingbo Dong, Zhe Jin, KokSheik Wong
Hashing technology gains much attention in protecting the biometric template lately. For instance, Index-of-Max (IoM), a recent reported hashing technique, is a ranking-based locality sensitive hashing technique, which illustrates the feasibility to protect the ordered and fixed-length biometric template. However, biometric templates are not always in the form of ordered and fixed-length, rather it may be an unordered and variable size point set e.g. fingerprint minutiae, which restricts the usage of the traditional hashing technology. In this paper, we proposed a generalized version of IoM hashing namely gIoM, and therefore the unordered and variable size biometric template can be used. We demonstrate a realization using a well-known variable size feature vector, fingerprint Minutia Cylinder-Code (MCC). The gIoM transforms MCC into index domain to form indexing-based feature representation. Consequently, the inversion of MCC from the transformed representation is computational infeasible, thus to achieve non-invertibility while the performance is preserved. Public fingerprint databases FVC2002 and FVC2004 are employed for experiment as benchmark to demonstrate a fair comparison with other methods. Moreover, the security and privacy analysis suggest that gIoM meets the criteria of template protection: non-invertibility, revocability, and non-linkability.
CVJul 2, 2024
Face Reconstruction Transfer Attack as Out-of-Distribution GeneralizationYoon Gyo Jung, Jaewoo Park, Xingbo Dong et al.
Understanding the vulnerability of face recognition systems to malicious attacks is of critical importance. Previous works have focused on reconstructing face images that can penetrate a targeted verification system. Even in the white-box scenario, however, naively reconstructed images misrepresent the identity information, hence the attacks are easily neutralized once the face system is updated or changed. In this paper, we aim to reconstruct face images which are capable of transferring face attacks on unseen encoders. We term this problem as Face Reconstruction Transfer Attack (FRTA) and show that it can be formulated as an out-of-distribution (OOD) generalization problem. Inspired by its OOD nature, we propose to solve FRTA by Averaged Latent Search and Unsupervised Validation with pseudo target (ALSUV). To strengthen the reconstruction attack on OOD unseen encoders, ALSUV reconstructs the face by searching the latent of amortized generator StyleGAN2 through multiple latent optimization, latent optimization trajectory averaging, and unsupervised validation with a pseudo target. We demonstrate the efficacy and generalization of our method on widely used face datasets, accompanying it with extensive ablation studies and visually, qualitatively, and quantitatively analyses. The source code will be released.
CVNov 11, 2025Code
UltraGS: Gaussian Splatting for Ultrasound Novel View SynthesisYuezhe Yang, Wenjie Cai, Dexin Yang et al.
Ultrasound imaging is a cornerstone of non-invasive clinical diagnostics, yet its limited field of view complicates novel view synthesis. We propose \textbf{UltraGS}, a Gaussian Splatting framework optimized for ultrasound imaging. First, we introduce a depth-aware Gaussian splatting strategy, where each Gaussian is assigned a learnable field of view, enabling accurate depth prediction and precise structural representation. Second, we design SH-DARS, a lightweight rendering function combining low-order spherical harmonics with ultrasound-specific wave physics, including depth attenuation, reflection, and scattering, to model tissue intensity accurately. Third, we contribute the Clinical Ultrasound Examination Dataset, a benchmark capturing diverse anatomical scans under real-world clinical protocols. Extensive experiments on three datasets demonstrate UltraGS's superiority, achieving state-of-the-art results in PSNR (up to 29.55), SSIM (up to 0.89), and MSE (as low as 0.002) while enabling real-time synthesis at 64.69 fps. The code and dataset are open-sourced at: https://github.com/Bean-Young/UltraGS.
LGSep 27, 2023
On the Computational Entanglement of Distant Features in Adversarial Machine LearningYenLung Lai, Xingbo Dong, Zhe Jin
In this research, we introduce the concept of "computational entanglement," a phenomenon observed in overparameterized feedforward linear networks that enables the network to achieve zero loss by fitting random noise, even on previously unseen test samples. Analyzing this behavior through spacetime diagrams reveals its connection to length contraction, where both training and test samples converge toward a shared normalized point within a flat Riemannian manifold. Moreover, we present a novel application of computational entanglement in transforming a worst-case adversarial examples-inputs that are highly non-robust and uninterpretable to human observers-into outputs that are both recognizable and robust. This provides new insights into the behavior of non-robust features in adversarial example generation, underscoring the critical role of computational entanglement in enhancing model robustness and advancing our understanding of neural networks in adversarial contexts.
CVMar 17, 2025Code
Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image SegmentationXingguo Lv, Xingbo Dong, Liwen Wang et al.
Despite domain generalization (DG) has significantly addressed the performance degradation of pre-trained models caused by domain shifts, it often falls short in real-world deployment. Test-time adaptation (TTA), which adjusts a learned model using unlabeled test data, presents a promising solution. However, most existing TTA methods struggle to deliver strong performance in medical image segmentation, primarily because they overlook the crucial prior knowledge inherent to medical images. To address this challenge, we incorporate morphological information and propose a framework based on multi-graph matching. Specifically, we introduce learnable universe embeddings that integrate morphological priors during multi-source training, along with novel unsupervised test-time paradigms for domain adaptation. This approach guarantees cycle-consistency in multi-matching while enabling the model to more effectively capture the invariant priors of unseen data, significantly mitigating the effects of domain shifts. Extensive experiments demonstrate that our method outperforms other state-of-the-art approaches on two medical image segmentation benchmarks for both multi-source and single-source domain generalization tasks. The source code is available at https://github.com/Yore0/TTDG-MGM.
CVApr 15, 2025Code
Explicit and Implicit Representations in AI-based 3D Reconstruction for Radiology: A Systematic ReviewYuezhe Yang, Boyu Yang, Yaqian Wang et al.
The demand for high-quality medical imaging in clinical practice and assisted diagnosis has made 3D reconstruction in radiological imaging a key research focus. Artificial intelligence (AI) has emerged as a promising approach to enhancing reconstruction accuracy while reducing acquisition and processing time, thereby minimizing patient radiation exposure and discomfort and ultimately benefiting clinical diagnosis. This review explores state-of-the-art AI-based 3D reconstruction algorithms in radiological imaging, categorizing them into explicit and implicit approaches based on their underlying principles. Explicit methods include point-based, volume-based, and Gaussian representations, while implicit methods encompass implicit prior embedding and neural radiance fields. Additionally, we examine commonly used evaluation metrics and benchmark datasets. Finally, we discuss the current state of development, key challenges, and future research directions in this evolving field. Our project available on: https://github.com/Bean-Young/AI4Radiology.
CVApr 12, 2024
IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision TransformerYuhang Qiu, Honghui Chen, Xingbo Dong et al.
Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT), which consists of two primary modules. The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs. It provides interpretable dense pixel-wise correspondences of feature points for fingerprint alignment and enhances the interpretability in the subsequent matching stage. The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching. It employs the ViTs trained in the first module with the additional fully connected layer and retrains them to simultaneously produce the discriminative fixed-length representation and interpretable dense pixel-wise correspondences of feature points. Extensive experimental results on diverse publicly available fingerprint databases demonstrate that the proposed framework not only exhibits superior performance on dense registration and matching but also significantly promotes the interpretability in deep fixed-length representations-based fingerprint matching.
CVApr 8
EventFace: Event-Based Face Recognition via Structure-Driven Spatiotemporal ModelingQingguo Meng, Xingbo Dong, Zhe Jin et al.
Event cameras offer a promising sensing modality for face recognition due to their inherent advantages in illumination robustness and privacy-friendliness. However, because event streams lack the stable photometric appearance relied upon by conventional RGB-based face recognition systems, we argue that event-based face recognition should model structure-driven spatiotemporal identity representations shaped by rigid facial motion and individual facial geometry. Since dedicated datasets for event-based face recognition remain lacking, we construct EFace, a small-scale event-based face dataset captured under rigid facial motion. To learn effectively from this limited event data, we further propose EventFace, a framework for event-based face recognition that integrates spatial structure and temporal dynamics for identity modeling. Specifically, we employ Low-Rank Adaptation (LoRA) to transfer structural facial priors from pretrained RGB face models to the event domain, thereby establishing a reliable spatial basis for identity modeling. Building on this foundation, we further introduce a Motion Prompt Encoder (MPE) to explicitly encode temporal features and a Spatiotemporal Modulator (STM) to fuse them with spatial features, thereby enhancing the representation of identity-relevant event patterns. Extensive experiments demonstrate that EventFace achieves the best performance among the evaluated baselines, with a Rank-1 identification rate of 94.19% and an equal error rate (EER) of 5.35%. Results further indicate that EventFace exhibits stronger robustness under degraded illumination than the competing methods. In addition, the learned representations exhibit reduced template reconstructability.
CVNov 8, 2024
Video RWKV:Video Action Recognition Based RWKVZhuowen Yin, Chengru Li, Xingbo Dong
To address the challenges of high computational costs and long-distance dependencies in exist ing video understanding methods, such as CNNs and Transformers, this work introduces RWKV to the video domain in a novel way. We propose a LSTM CrossRWKV (LCR) framework, designed for spatiotemporal representation learning to tackle the video understanding task. Specifically, the proposed linear complexity LCR incorporates a novel Cross RWKV gate to facilitate interaction be tween current frame edge information and past features, enhancing the focus on the subject through edge features and globally aggregating inter-frame features over time. LCR stores long-term mem ory for video processing through an enhanced LSTM recurrent execution mechanism. By leveraging the Cross RWKV gate and recurrent execution, LCR effectively captures both spatial and temporal features. Additionally, the edge information serves as a forgetting gate for LSTM, guiding long-term memory management.Tube masking strategy reduces redundant information in food and reduces overfitting.These advantages enable LSTM CrossRWKV to set a new benchmark in video under standing, offering a scalable and efficient solution for comprehensive video analysis. All code and models are publicly available.
IVJul 1, 2025
EvRWKV: A Continuous Interactive RWKV Framework for Effective Event-Guided Low-Light Image EnhancementWenjie Cai, Qingguo Meng, Zhenyu Wang et al.
Event cameras offer significant potential for Low-light Image Enhancement (LLIE), yet existing fusion approaches are constrained by a fundamental dilemma: early fusion struggles with modality heterogeneity, while late fusion severs crucial feature correlations. To address these limitations, we propose EvRWKV, a novel framework that enables continuous cross-modal interaction through dual-domain processing, which mainly includes a Cross-RWKV Module to capture fine-grained temporal and cross-modal dependencies, and an Event Image Spectral Fusion Enhancer (EISFE) module to perform joint adaptive frequency-domain denoising and spatial-domain alignment. This continuous interaction maintains feature consistency from low-level textures to high-level semantics. Extensive experiments on the real-world SDE and SDSD datasets demonstrate that EvRWKV significantly outperforms only image-based methods by 1.79 dB and 1.85 dB in PSNR, respectively. To further validate the practical utility of our method for downstream applications, we evaluated its impact on semantic segmentation. Experiments demonstrate that images enhanced by EvRWKV lead to a significant 35.44% improvement in mIoU.
CRJun 23, 2020
Interpretable security analysis of cancellable biometrics using constrained-optimized similarity-based attackHanrui Wang, Xingbo Dong, Zhe Jin et al.
In cancellable biometrics (CB) schemes, template security is achieved by applying, mainly non-linear, transformations to the biometric template. The transformation is designed to preserve the template distance/similarity in the transformed domain. Despite its effectiveness, the security issues attributed to similarity preservation property of CB are underestimated. Dong et al. [BTAS'19], exploited the similarity preservation trait of CB and proposed a similarity-based attack with high successful attack rate. The similarity-based attack utilizes preimage that are generated from the protected biometric template for impersonation and perform cross matching. In this paper, we propose a constrained optimization similarity-based attack (CSA), which is improved upon Dong's genetic algorithm enabled similarity-based attack (GASA). The CSA applies algorithm-specific equality or inequality relations as constraints, to optimize preimage generation. We interpret the effectiveness of CSA from the supervised learning perspective. We identify such constraints then conduct extensive experiments to demonstrate CSA against CB with LFW face dataset. The results suggest that CSA is effective to breach IoM hashing and BioHashing security, and outperforms GASA significantly. Inferring from the above results, we further remark that, other than IoM and BioHashing, CSA is critical to other CB schemes as far as the constraints can be formulated. Furthermore, we reveal the correlation of hash code size and the attack performance of CSA.
CVJun 9, 2020
Multi-spectral Facial Landmark DetectionJin Keong, Xingbo Dong, Zhe Jin et al.
Thermal face image analysis is favorable for certain circumstances. For example, illumination-sensitive applications, like nighttime surveillance; and privacy-preserving demanded access control. However, the inadequate study on thermal face image analysis calls for attention in responding to the industry requirements. Detecting facial landmark points are important for many face analysis tasks, such as face recognition, 3D face reconstruction, and face expression recognition. In this paper, we propose a robust neural network enabled facial landmark detection, namely Deep Multi-Spectral Learning (DMSL). Briefly, DMSL consists of two sub-models, i.e. face boundary detection, and landmark coordinates detection. Such an architecture demonstrates the capability of detecting the facial landmarks on both visible and thermal images. Particularly, the proposed DMSL model is robust in facial landmark detection where the face is partially occluded, or facing different directions. The experiment conducted on Eurecom's visible and thermal paired database shows the superior performance of DMSL over the state-of-the-art for thermal facial landmark detection. In addition to that, we have annotated a thermal face dataset with their respective facial landmark for the purpose of experimentation.
CVOct 17, 2019
On the Risk of Cancelable BiometricsXingbo Dong, Jaewoo Park, Zhe Jin et al.
Cancelable biometrics (CB) employs an irreversible transformation to convert the biometric features into transformed templates while preserving the relative distance between two templates for security and privacy protection. However, distance preservation invites unexpected security issues such as pre-image attacks, which are often neglected.This paper presents a generalized pre-image attack method and its extension version that operates on practical CB systems. We theoretically reveal that distance preservation property is a vulnerability source in the CB schemes. We then propose an empirical information leakage estimation algorithm to access the pre-image attack risk of the CB schemes. The experiments conducted with six CB schemes designed for the face, iris and fingerprint, demonstrate that the risks originating from the distance computed from two transformed templates significantly compromise the security of CB schemes. Our work reveals the potential risk of existing CB systems theoretically and experimentally.
CVMay 8, 2019
A Genetic Algorithm Enabled Similarity-Based Attack on Cancellable BiometricsXingbo Dong, Zhe Jin, Andrew Teoh Beng Jin
Cancellable biometrics (CB) as a means for biometric template protection approach refers to an irreversible yet similarity preserving transformation on the original template. With similarity preserving property, the matching between template and query instance can be performed in the transform domain without jeopardizing accuracy performance. Unfortunately, this trait invites a class of attack, namely similarity-based attack (SA). SA produces a preimage, an inverse of transformed template, which can be exploited for impersonation and cross-matching. In this paper, we propose a Genetic Algorithm enabled similarity-based attack framework (GASAF) to demonstrate that CB schemes whose possess similarity preserving property are highly vulnerable to similarity-based attack. Besides that, a set of new metrics is designed to measure the effectiveness of the similarity-based attack. We conduct the experiment on two representative CB schemes, i.e. BioHashing and Bloom-filter. The experimental results attest the vulnerability under this type of attack.