Chenxi Du

CV
h-index11
3papers
17citations
Novelty50%
AI Score28

3 Papers

CVAug 23, 2024
La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection

Hang Zou, Chenxi Du, Hui Zhang et al.

Facial recognition systems are susceptible to both physical and digital attacks, posing significant security risks. Traditional approaches often treat these two attack types separately due to their distinct characteristics. Thus, when being combined attacked, almost all methods could not deal. Some studies attempt to combine the sparse data from both types of attacks into a single dataset and try to find a common feature space, which is often impractical due to the space is difficult to be found or even non-existent. To overcome these challenges, we propose a novel approach that uses the sparse model to handle sparse data, utilizing different parameter groups to process distinct regions of the sparse feature space. Specifically, we employ the Mixture of Experts (MoE) framework in our model, expert parameters are matched to tokens with varying weights during training and adaptively activated during testing. However, the traditional MoE struggles with the complex and irregular classification boundaries of this problem. Thus, we introduce a flexible self-adapting weighting mechanism, enabling the model to better fit and adapt. In this paper, we proposed La-SoftMoE CLIP, which allows for more flexible adaptation to the Unified Attack Detection (UAD) task, significantly enhancing the model's capability to handle diversity attacks. Experiment results demonstrate that our proposed method has SOTA performance.

CVAug 19, 2024
A Unified Framework for Iris Anti-Spoofing: Introducing Iris Anti-Spoofing Cross-Domain-Testing Protocol and Masked-MoE Method

Hang Zou, Chenxi Du, Ajian Liu et al.

Iris recognition is widely used in high-security scenarios due to its stability and distinctiveness. However, iris images captured by different devices exhibit certain and device-related consistent differences, which has a greater impact on the classification algorithm for anti-spoofing. The iris of various races would also affect the classification, causing the risk of identity theft. So it is necessary to improve the cross-domain capabilities of the iris anti-spoofing (IAS) methods to enable it more robust in facing different races and devices. However, there is no existing protocol that is comprehensively available. To address this gap, we propose an Iris Anti-Spoofing Cross-Domain-Testing (IAS-CDT) Protocol, which involves 10 datasets, belonging to 7 databases, published by 4 institutions, and collected with 6 different devices. It contains three sub-protocols hierarchically, aimed at evaluating average performance, cross-racial generalization, and cross-device generalization of IAS models. Moreover, to address the cross-device generalization challenge brought by the IAS-CDT Protocol, we employ multiple model parameter sets to learn from the multiple sub-datasets. Specifically, we utilize the Mixture of Experts (MoE) to fit complex data distributions using multiple sub-neural networks. To further enhance the generalization capabilities, we propose a novel method Masked-MoE (MMoE), which randomly masks a portion of tokens for some experts and requires their outputs to be similar to the unmasked experts, which can effectively mitigate the overfitting issue of MoE. For the evaluation, we selected ResNet50, VIT-B/16, CLIP, and FLIP as representative models and benchmarked them under the proposed IAS-CDT Protocol.

CVMar 26, 2025
FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks

Jinwei Li, Huan-ang Gao, Wenyi Li et al. · tsinghua

With the rapid advancements in diffusion models and 3D generation techniques, dynamic 3D content generation has become a crucial research area. However, achieving high-fidelity 4D (dynamic 3D) generation with strong spatial-temporal consistency remains a challenging task. Inspired by recent findings that pretrained diffusion features capture rich correspondences, we propose FB-4D, a novel 4D generation framework that integrates a Feature Bank mechanism to enhance both spatial and temporal consistency in generated frames. In FB-4D, we store features extracted from previous frames and fuse them into the process of generating subsequent frames, ensuring consistent characteristics across both time and multiple views. To ensure a compact representation, the Feature Bank is updated by a proposed dynamic merging mechanism. Leveraging this Feature Bank, we demonstrate for the first time that generating additional reference sequences through multiple autoregressive iterations can continuously improve generation performance. Experimental results show that FB-4D significantly outperforms existing methods in terms of rendering quality, spatial-temporal consistency, and robustness. It surpasses all multi-view generation tuning-free approaches by a large margin and achieves performance on par with training-based methods.