Zezheng Wang

CV
h-index7
14papers
1,532citations
Novelty48%
AI Score43

14 Papers

CVMar 10, 2022Code
Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing

Zhuo Wang, Zezheng Wang, Zitong Yu et al.

With diverse presentation attacks emerging continually, generalizable face anti-spoofing (FAS) has drawn growing attention. Most existing methods implement domain generalization (DG) on the complete representations. However, different image statistics may have unique properties for the FAS tasks. In this work, we separate the complete representation into content and style ones. A novel Shuffled Style Assembly Network (SSAN) is proposed to extract and reassemble different content and style features for a stylized feature space. Then, to obtain a generalized representation, a contrastive learning strategy is developed to emphasize liveness-related style information while suppress the domain-specific one. Finally, the representations of the correct assemblies are used to distinguish between living and spoofing during the inferring. On the other hand, despite the decent performance, there still exists a gap between academia and industry, due to the difference in data quantity and distribution. Thus, a new large-scale benchmark for FAS is built up to further evaluate the performance of algorithms in reality. Both qualitative and quantitative results on existing and proposed benchmarks demonstrate the effectiveness of our methods. The codes will be available at https://github.com/wangzhuo2019/SSAN.

AIAug 26, 2025Code
Dynamic Collaboration of Multi-Language Models based on Minimal Complete Semantic Units

Chao Hao, Zezheng Wang, Yanhua Huang et al.

This paper investigates the enhancement of reasoning capabilities in language models through token-level multi-model collaboration. Our approach selects the optimal tokens from the next token distributions provided by multiple models to perform autoregressive reasoning. Contrary to the assumption that more models yield better results, we introduce a distribution distance-based dynamic selection strategy (DDS) to optimize the multi-model collaboration process. To address the critical challenge of vocabulary misalignment in multi-model collaboration, we propose the concept of minimal complete semantic units (MCSU), which is simple yet enables multiple language models to achieve natural alignment within the linguistic space. Experimental results across various benchmarks demonstrate the superiority of our method. The code will be available at https://github.com/Fanye12/DDS.

CVNov 24, 2021Code
Consistency Regularization for Deep Face Anti-Spoofing

Zezheng Wang, Zitong Yu, Xun Wang et al.

Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems. Empirically, given an image, a model with more consistent output on different views of this image usually performs better, as shown in Fig.1. Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models. In this paper, we explore this way thoroughly by enhancing both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS. Specifically, at the embedding-level, we design a dense similarity loss to maximize the similarities between all positions of two intermediate feature maps in a self-supervised fashion; while at the prediction-level, we optimize the mean square error between the predictions of two views. Notably, our EPCR is free of annotations and can directly integrate into semi-supervised learning schemes. Considering different application scenarios, we further design five diverse semi-supervised protocols to measure semi-supervised FAS techniques. We conduct extensive experiments to show that EPCR can significantly improve the performance of several supervised and semi-supervised tasks on benchmark datasets. The codes and protocols will be released at https://github.com/clks-wzz/EPCR.

CVApr 17, 2020Code
Multi-Modal Face Anti-Spoofing Based on Central Difference Networks

Zitong Yu, Yunxiao Qin, Xiaobai Li et al.

Face anti-spoofing (FAS) plays a vital role in securing face recognition systems from presentation attacks. Existing multi-modal FAS methods rely on stacked vanilla convolutions, which is weak in describing detailed intrinsic information from modalities and easily being ineffective when the domain shifts (e.g., cross attack and cross ethnicity). In this paper, we extend the central difference convolutional networks (CDCN) \cite{yu2020searching} to a multi-modal version, intending to capture intrinsic spoofing patterns among three modalities (RGB, depth and infrared). Meanwhile, we also give an elaborate study about single-modal based CDCN. Our approach won the first place in "Track Multi-Modal" as well as the second place in "Track Single-Modal (RGB)" of ChaLearn Face Anti-spoofing Attack Detection Challenge@CVPR2020 \cite{liu2020cross}. Our final submission obtains 1.02$\pm$0.59\% and 4.84$\pm$1.79\% ACER in "Track Multi-Modal" and "Track Single-Modal (RGB)", respectively. The codes are available at{https://github.com/ZitongYu/CDCN}.

CVMar 18, 2020Code
Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing

Zezheng Wang, Zitong Yu, Chenxu Zhao et al.

Face anti-spoofing is critical to the security of face recognition systems. Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing. Despite the great success, most previous works still formulate the problem as a single-frame multi-task one by simply augmenting the loss with depth, while neglecting the detailed fine-grained information and the interplay between facial depths and moving patterns. In contrast, we design a new approach to detect presentation attacks from multiple frames based on two insights: 1) detailed discriminative clues (e.g., spatial gradient magnitude) between living and spoofing face may be discarded through stacked vanilla convolutions, and 2) the dynamics of 3D moving faces provide important clues in detecting the spoofing faces. The proposed method is able to capture discriminative details via Residual Spatial Gradient Block (RSGB) and encode spatio-temporal information from Spatio-Temporal Propagation Module (STPM) efficiently. Moreover, a novel Contrastive Depth Loss is presented for more accurate depth supervision. To assess the efficacy of our method, we also collect a Double-modal Anti-spoofing Dataset (DMAD) which provides actual depth for each sample. The experiments demonstrate that the proposed approach achieves state-of-the-art results on five benchmark datasets including OULU-NPU, SiW, CASIA-MFSD, Replay-Attack, and the new DMAD. Codes will be available at https://github.com/clks-wzz/FAS-SGTD.

CVMar 9, 2020Code
Searching Central Difference Convolutional Networks for Face Anti-Spoofing

Zitong Yu, Chenxu Zhao, Zezheng Wang et al.

Face anti-spoofing (FAS) plays a vital role in face recognition systems. Most state-of-the-art FAS methods 1) rely on stacked convolutions and expert-designed network, which is weak in describing detailed fine-grained information and easily being ineffective when the environment varies (e.g., different illumination), and 2) prefer to use long sequence as input to extract dynamic features, making them difficult to deploy into scenarios which need quick response. Here we propose a novel frame level FAS method based on Central Difference Convolution (CDC), which is able to capture intrinsic detailed patterns via aggregating both intensity and gradient information. A network built with CDC, called the Central Difference Convolutional Network (CDCN), is able to provide more robust modeling capacity than its counterpart built with vanilla convolution. Furthermore, over a specifically designed CDC search space, Neural Architecture Search (NAS) is utilized to discover a more powerful network structure (CDCN++), which can be assembled with Multiscale Attention Fusion Module (MAFM) for further boosting performance. Comprehensive experiments are performed on six benchmark datasets to show that 1) the proposed method not only achieves superior performance on intra-dataset testing (especially 0.2% ACER in Protocol-1 of OULU-NPU dataset), 2) it also generalizes well on cross-dataset testing (particularly 6.5% HTER from CASIA-MFSD to Replay-Attack datasets). The codes are available at \href{https://github.com/ZitongYu/CDCN}{https://github.com/ZitongYu/CDCN}.

CVNov 12, 2021
Meta-Teacher For Face Anti-Spoofing

Yunxiao Qin, Zitong Yu, Longbin Yan et al.

Face anti-spoofing (FAS) secures face recognition from presentation attacks (PAs). Existing FAS methods usually supervise PA detectors with handcrafted binary or pixel-wise labels. However, handcrafted labels may are not the most adequate way to supervise PA detectors learning sufficient and intrinsic spoofing cues. Instead of using the handcrafted labels, we propose a novel Meta-Teacher FAS (MT-FAS) method to train a meta-teacher for supervising PA detectors more effectively. The meta-teacher is trained in a bi-level optimization manner to learn the ability to supervise the PA detectors learning rich spoofing cues. The bi-level optimization contains two key components: 1) a lower-level training in which the meta-teacher supervises the detector's learning process on the training set; and 2) a higher-level training in which the meta-teacher's teaching performance is optimized by minimizing the detector's validation loss. Our meta-teacher differs significantly from existing teacher-student models because the meta-teacher is explicitly trained for better teaching the detector (student), whereas existing teachers are trained for outstanding accuracy neglecting teaching ability. Extensive experiments on five FAS benchmarks show that with the proposed MT-FAS, the trained meta-teacher 1) provides better-suited supervision than both handcrafted labels and existing teacher-student models; and 2) significantly improves the performances of PA detectors.

CVJul 25, 2021
PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition

Qiang Meng, Xiaqing Xu, Xiaobo Wang et al.

Despite the great success achieved by deep learning methods in face recognition, severe performance drops are observed for large pose variations in unconstrained environments (e.g., in cases of surveillance and photo-tagging). To address it, current methods either deploy pose-specific models or frontalize faces by additional modules. Still, they ignore the fact that identity information should be consistent across poses and are not realizing the data imbalance between frontal and profile face images during training. In this paper, we propose an efficient PoseFace framework which utilizes the facial landmarks to disentangle the pose-invariant features and exploits a pose-adaptive loss to handle the imbalance issue adaptively. Extensive experimental results on the benchmarks of Multi-PIE, CFP, CPLFW and IJB have demonstrated the superiority of our method over the state-of-the-arts.

CVJul 16, 2020
Layer-Wise Adaptive Updating for Few-Shot Image Classification

Yunxiao Qin, Weiguo Zhang, Zezheng Wang et al.

Few-shot image classification (FSIC), which requires a model to recognize new categories via learning from few images of these categories, has attracted lots of attention. Recently, meta-learning based methods have been shown as a promising direction for FSIC. Commonly, they train a meta-learner (meta-learning model) to learn easy fine-tuning weight, and when solving an FSIC task, the meta-learner efficiently fine-tunes itself to a task-specific model by updating itself on few images of the task. In this paper, we propose a novel meta-learning based layer-wise adaptive updating (LWAU) method for FSIC. LWAU is inspired by an interesting finding that compared with common deep models, the meta-learner pays much more attention to update its top layer when learning from few images. According to this finding, we assume that the meta-learner may greatly prefer updating its top layer to updating its bottom layers for better FSIC performance. Therefore, in LWAU, the meta-learner is trained to learn not only the easy fine-tuning model but also its favorite layer-wise adaptive updating rule to improve its learning efficiency. Extensive experiments show that with the layer-wise adaptive updating rule, the proposed LWAU: 1) outperforms existing few-shot classification methods with a clear margin; 2) learns from few images more efficiently by at least 5 times than existing meta-learners when solving FSIC.

CVApr 29, 2019
Learning Meta Model for Zero- and Few-shot Face Anti-spoofing

Yunxiao Qin, Chenxu Zhao, Xiangyu Zhu et al.

Face anti-spoofing is crucial to the security of face recognition systems. Most previous methods formulate face anti-spoofing as a supervised learning problem to detect various predefined presentation attacks, which need large scale training data to cover as many attacks as possible. However, the trained model is easy to overfit several common attacks and is still vulnerable to unseen attacks. To overcome this challenge, the detector should: 1) learn discriminative features that can generalize to unseen spoofing types from predefined presentation attacks; 2) quickly adapt to new spoofing types by learning from both the predefined attacks and a few examples of the new spoofing types. Therefore, we define face anti-spoofing as a zero- and few-shot learning problem. In this paper, we propose a novel Adaptive Inner-update Meta Face Anti-Spoofing (AIM-FAS) method to tackle this problem through meta-learning. Specifically, AIM-FAS trains a meta-learner focusing on the task of detecting unseen spoofing types by learning from predefined living and spoofing faces and a few examples of new attacks. To assess the proposed approach, we propose several benchmarks for zero- and few-shot FAS. Experiments show its superior performances on the presented benchmarks to existing methods in existing zero-shot FAS protocols.

CVDec 11, 2018
Prior-Knowledge and Attention-based Meta-Learning for Few-Shot Learning

Yunxiao Qin, Weiguo Zhang, Chenxu Zhao et al.

Recently, meta-learning has been shown as a promising way to solve few-shot learning. In this paper, inspired by the human cognition process which utilizes both prior-knowledge and vision attention in learning new knowledge, we present a novel paradigm of meta-learning approach with three developments to introduce attention mechanism and prior-knowledge for meta-learning. In our approach, prior-knowledge is responsible for helping meta-learner expressing the input data into high-level representation space, and attention mechanism enables meta-learner focusing on key features of the data in the representation space. Compared with existing meta-learning approaches that pay little attention to prior-knowledge and vision attention, our approach alleviates the meta-learner's few-shot cognition burden. Furthermore, a Task-Over-Fitting (TOF) problem, which indicates that the meta-learner has poor generalization on different K-shot learning tasks, is discovered and we propose a Cross-Entropy across Tasks (CET) metric to model and solve the TOF problem. Extensive experiments demonstrate that we improve the meta-learner with state-of-the-art performance on several few-shot learning benchmarks, and at the same time the TOF problem can also be released greatly.

CVDec 2, 2018
A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing

Shifeng Zhang, Xiaobo Wang, Ajian Liu et al.

Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects ($\le\negmedspace170$) and modalities ($\leq\negmedspace2$), which hinder the further development of the academic community. To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIA-SURF, which is the largest publicly available dataset for face anti-spoofing in terms of both subjects and visual modalities. Specifically, it consists of $1,000$ subjects with $21,000$ videos and each sample has $3$ modalities (i.e., RGB, Depth and IR). We also provide a measurement set, evaluation protocol and training/validation/testing subsets, developing a new benchmark for face anti-spoofing. Moreover, we present a new multi-modal fusion method as baseline, which performs feature re-weighting to select the more informative channel features while suppressing the less useful ones for each modal. Extensive experiments have been conducted on the proposed dataset to verify its significance and generalization capability. The dataset is available at https://sites.google.com/qq.com/chalearnfacespoofingattackdete

LGNov 19, 2018
Representation based and Attention augmented Meta learning

Yunxiao Qin, Chenxu Zhao, Zezheng Wang et al.

Deep learning based computer vision fails to work when labeled images are scarce. Recently, Meta learning algorithm has been confirmed as a promising way to improve the ability of learning from few images for computer vision. However, previous Meta learning approaches expose problems: 1) they ignored the importance of attention mechanism for the Meta learner; 2) they didn't give the Meta learner the ability of well using the past knowledge which can help to express images into high representations, resulting in that the Meta learner has to solve few shot learning task directly from the original high dimensional RGB images. In this paper, we argue that the attention mechanism and the past knowledge are crucial for the Meta learner, and the Meta learner should be trained on high representations of the RGB images instead of directly on the original ones. Based on these arguments, we propose two methods: Attention augmented Meta Learning (AML) and Representation based and Attention augmented Meta Learning(RAML). The method AML aims to improve the Meta learner's attention ability by explicitly embedding an attention model into its network. The method RAML aims to give the Meta learner the ability of leveraging the past learned knowledge to reduce the dimension of the original input data by expressing it into high representations, and help the Meta learner to perform well. Extensive experiments demonstrate the effectiveness of the proposed models, with state-of-the-art few shot learning performances on several few shot learning benchmarks. The source code of our proposed methods will be released soon to facilitate further studies on those aforementioned problem.

CVNov 13, 2018
Exploiting temporal and depth information for multi-frame face anti-spoofing

Zezheng Wang, Chenxu Zhao, Yunxiao Qin et al.

Face anti-spoofing is significant to the security of face recognition systems. Previous works on depth supervised learning have proved the effectiveness for face anti-spoofing. Nevertheless, they only considered the depth as an auxiliary supervision in the single frame. Different from these methods, we develop a new method to estimate depth information from multiple RGB frames and propose a depth-supervised architecture which can efficiently encodes spatiotemporal information for presentation attack detection. It includes two novel modules: optical flow guided feature block (OFFB) and convolution gated recurrent units (ConvGRU) module, which are designed to extract short-term and long-term motion to discriminate living and spoofing faces. Extensive experiments demonstrate that the proposed approach achieves state-of-the-art results on four benchmark datasets, namely OULU-NPU, SiW, CASIA-MFSD, and Replay-Attack.