CVAug 10, 2023
TrainFors: A Large Benchmark Training Dataset for Image Manipulation Detection and LocalizationSoumyaroop Nandi, Prem Natarajan, Wael Abd-Almageed
The evaluation datasets and metrics for image manipulation detection and localization (IMDL) research have been standardized. But the training dataset for such a task is still nonstandard. Previous researchers have used unconventional and deviating datasets to train neural networks for detecting image forgeries and localizing pixel maps of manipulated regions. For a fair comparison, the training set, test set, and evaluation metrics should be persistent. Hence, comparing the existing methods may not seem fair as the results depend heavily on the training datasets as well as the model architecture. Moreover, none of the previous works release the synthetic training dataset used for the IMDL task. We propose a standardized benchmark training dataset for image splicing, copy-move forgery, removal forgery, and image enhancement forgery. Furthermore, we identify the problems with the existing IMDL datasets and propose the required modifications. We also train the state-of-the-art IMDL methods on our proposed TrainFors1 dataset for a fair evaluation and report the actual performance of these methods under similar conditions.
CVSep 14, 2022
CAT: Controllable Attribute Translation for Fair Facial Attribute ClassificationJiazhi Li, Wael Abd-Almageed
As the social impact of visual recognition has been under scrutiny, several protected-attribute balanced datasets emerged to address dataset bias in imbalanced datasets. However, in facial attribute classification, dataset bias stems from both protected attribute level and facial attribute level, which makes it challenging to construct a multi-attribute-level balanced real dataset. To bridge the gap, we propose an effective pipeline to generate high-quality and sufficient facial images with desired facial attributes and supplement the original dataset to be a balanced dataset at both levels, which theoretically satisfies several fairness criteria. The effectiveness of our method is verified on sex classification and facial attribute classification by yielding comparable task performance as the original dataset and further improving fairness in a comprehensive fairness evaluation with a wide range of metrics. Furthermore, our method outperforms both resampling and balanced dataset construction to address dataset bias, and debiasing models to address task bias.
LGSep 21, 2022
SW-VAE: Weakly Supervised Learn Disentangled Representation Via Latent Factor SwappingJiageng Zhu, Hanchen Xie, Wael Abd-Almageed
Representation disentanglement is an important goal of representation learning that benefits various downstream tasks. To achieve this goal, many unsupervised learning representation disentanglement approaches have been developed. However, the training process without utilizing any supervision signal have been proved to be inadequate for disentanglement representation learning. Therefore, we propose a novel weakly-supervised training approach, named as SW-VAE, which incorporates pairs of input observations as supervision signals by using the generative factors of datasets. Furthermore, we introduce strategies to gradually increase the learning difficulty during training to smooth the training process. As shown on several datasets, our model shows significant improvement over state-of-the-art (SOTA) methods on representation disentanglement tasks.
LGSep 15, 2022
Weakly Supervised Invariant Representation Learning Via Disentangling Known and Unknown Nuisance FactorsJiageng Zhu, Hanchen Xie, Wael Abd-Almageed
Disentangled and invariant representations are two critical goals of representation learning and many approaches have been proposed to achieve either one of them. However, those two goals are actually complementary to each other so that we propose a framework to accomplish both of them simultaneously. We introduce a weakly supervised signal to learn disentangled representation which consists of three splits containing predictive, known nuisance and unknown nuisance information respectively. Furthermore, we incorporate contrastive method to enforce representation invariance. Experiments shows that the proposed method outperforms state-of-the-art (SOTA) methods on four standard benchmarks and shows that the proposed method can have better adversarial defense ability comparing to other methods without adversarial training.
LGJul 30, 2024
DiffusionCounterfactuals: Inferring High-dimensional Counterfactuals with Guidance of Causal RepresentationsJiageng Zhu, Hanchen Xie, Jiazhi Li et al.
Accurate estimation of counterfactual outcomes in high-dimensional data is crucial for decision-making and understanding causal relationships and intervention outcomes in various domains, including healthcare, economics, and social sciences. However, existing methods often struggle to generate accurate and consistent counterfactuals, particularly when the causal relationships are complex. We propose a novel framework that incorporates causal mechanisms and diffusion models to generate high-quality counterfactual samples guided by causal representation. Our approach introduces a novel, theoretically grounded training and sampling process that enables the model to consistently generate accurate counterfactual high-dimensional data under multiple intervention steps. Experimental results on various synthetic and real benchmarks demonstrate the proposed approach outperforms state-of-the-art methods in generating accurate and high-quality counterfactuals, using different evaluation metrics.
CVNov 8, 2021
Explaining Face Presentation Attack Detection Using Natural LanguageHengameh Mirzaalian, Mohamed E. Hussein, Leonidas Spinoulas et al.
A large number of deep neural network based techniques have been developed to address the challenging problem of face presentation attack detection (PAD). Whereas such techniques' focus has been on improving PAD performance in terms of classification accuracy and robustness against unseen attacks and environmental conditions, there exists little attention on the explainability of PAD predictions. In this paper, we tackle the problem of explaining PAD predictions through natural language. Our approach passes feature representations of a deep layer of the PAD model to a language model to generate text describing the reasoning behind the PAD prediction. Due to the limited amount of annotated data in our study, we apply a light-weight LSTM network as our natural language generation model. We investigate how the quality of the generated explanations is affected by different loss functions, including the commonly used word-wise cross entropy loss, a sentence discriminative loss, and a sentence semantic loss. We perform our experiments using face images from a dataset consisting of 1,105 bona-fide and 924 presentation attack samples. Our quantitative and qualitative results show the effectiveness of our model for generating proper PAD explanations through text as well as the power of the sentence-wise losses. To the best of our knowledge, this is the first introduction of a joint biometrics-NLP task. Our dataset can be obtained through our GitHub page.
CVNov 8, 2021
Information-Theoretic Bias Assessment Of Learned Representations Of Pretrained Face RecognitionJiazhi Li, Wael Abd-Almageed
As equality issues in the use of face recognition have garnered a lot of attention lately, greater efforts have been made to debiased deep learning models to improve fairness to minorities. However, there is still no clear definition nor sufficient analysis for bias assessment metrics. We propose an information-theoretic, independent bias assessment metric to identify degree of bias against protected demographic attributes from learned representations of pretrained facial recognition systems. Our metric differs from other methods that rely on classification accuracy or examine the differences between ground truth and predicted labels of protected attributes predicted using a shallow network. Also, we argue, theoretically and experimentally, that logits-level loss is not adequate to explain bias since predictors based on neural networks will always find correlations. Further, we present a synthetic dataset that mitigates the issue of insufficient samples in certain cohorts. Lastly, we establish a benchmark metric by presenting advantages in clear discrimination and small variation comparing with other metrics, and evaluate the performance of different debiased models with the proposed metric.
CVSep 15, 2021
Partner-Assisted Learning for Few-Shot Image ClassificationJiawei Ma, Hanchen Xie, Guangxing Han et al.
Few-shot Learning has been studied to mimic human visual capabilities and learn effective models without the need of exhaustive human annotation. Even though the idea of meta-learning for adaptation has dominated the few-shot learning methods, how to train a feature extractor is still a challenge. In this paper, we focus on the design of training strategy to obtain an elemental representation such that the prototype of each novel class can be estimated from a few labeled samples. We propose a two-stage training scheme, Partner-Assisted Learning (PAL), which first trains a partner encoder to model pair-wise similarities and extract features serving as soft-anchors, and then trains a main encoder by aligning its outputs with soft-anchors while attempting to maximize classification performance. Two alignment constraints from logit-level and feature-level are designed individually. For each few-shot task, we perform prototype classification. Our method consistently outperforms the state-of-the-art method on four benchmarks. Detailed ablation studies of PAL are provided to justify the selection of each component involved in training.
CVAug 27, 2021
SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic SegmentationJiaxin Cheng, Soumyaroop Nandi, Prem Natarajan et al.
Unlike conventional zero-shot classification, zero-shot semantic segmentation predicts a class label at the pixel level instead of the image level. When solving zero-shot semantic segmentation problems, the need for pixel-level prediction with surrounding context motivates us to incorporate spatial information using positional encoding. We improve standard positional encoding by introducing the concept of Relative Positional Encoding, which integrates spatial information at the feature level and can handle arbitrary image sizes. Furthermore, while self-training is widely used in zero-shot semantic segmentation to generate pseudo-labels, we propose a new knowledge-distillation-inspired self-training strategy, namely Annealed Self-Training, which can automatically assign different importance to pseudo-labels to improve performance. We systematically study the proposed Relative Positional Encoding and Annealed Self-Training in a comprehensive experimental evaluation, and our empirical results confirm the effectiveness of our method on three benchmark datasets.
CVAug 27, 2021
Detection and Continual Learning of Novel Face Presentation AttacksMohammad Rostami, Leonidas Spinoulas, Mohamed Hussein et al.
Advances in deep learning, combined with availability of large datasets, have led to impressive improvements in face presentation attack detection research. However, state-of-the-art face antispoofing systems are still vulnerable to novel types of attacks that are never seen during training. Moreover, even if such attacks are correctly detected, these systems lack the ability to adapt to newly encountered attacks. The post-training ability of continually detecting new types of attacks and self-adaptation to identify these attack types, after the initial detection phase, is highly appealing. In this paper, we enable a deep neural network to detect anomalies in the observed input data points as potential new types of attacks by suppressing the confidence-level of the network outside the training samples' distribution. We then use experience replay to update the model to incorporate knowledge about new types of attacks without forgetting the past learned attack types. Experimental results are provided to demonstrate the effectiveness of the proposed method on two benchmark datasets as well as a newly introduced dataset which exhibits a large variety of attack types.
LGNov 30, 2020
MUSCLE: Strengthening Semi-Supervised Learning Via Concurrent Unsupervised Learning Using Mutual Information MaximizationHanchen Xie, Mohamed E. Hussein, Aram Galstyan et al.
Deep neural networks are powerful, massively parameterized machine learning models that have been shown to perform well in supervised learning tasks. However, very large amounts of labeled data are usually needed to train deep neural networks. Several semi-supervised learning approaches have been proposed to train neural networks using smaller amounts of labeled data with a large amount of unlabeled data. The performance of these semi-supervised methods significantly degrades as the size of labeled data decreases. We introduce Mutual-information-based Unsupervised & Semi-supervised Concurrent LEarning (MUSCLE), a hybrid learning approach that uses mutual information to combine both unsupervised and semi-supervised learning. MUSCLE can be used as a stand-alone training scheme for neural networks, and can also be incorporated into other learning approaches. We show that the proposed hybrid model outperforms state of the art on several standard benchmarks, including CIFAR-10, CIFAR-100, and Mini-Imagenet. Furthermore, the performance gain consistently increases with the reduction in the amount of labeled data, as well as in the presence of bias. We also show that MUSCLE has the potential to boost the classification performance when used in the fine-tuning phase for a model pre-trained only on unlabeled data.
CLJul 5, 2020
CORD19STS: COVID-19 Semantic Textual Similarity DatasetXiao Guo, Hengameh Mirzaalian, Ekraam Sabir et al.
In order to combat the COVID-19 pandemic, society can benefit from various natural language processing applications, such as dialog medical diagnosis systems and information retrieval engines calibrated specifically for COVID-19. These applications rely on the ability to measure semantic textual similarity (STS), making STS a fundamental task that can benefit several downstream applications. However, existing STS datasets and models fail to translate their performance to a domain-specific environment such as COVID-19. To overcome this gap, we introduce CORD19STS dataset which includes 13,710 annotated sentence pairs collected from COVID-19 open research dataset (CORD-19) challenge. To be specific, we generated one million sentence pairs using different sampling strategies. We then used a finetuned BERT-like language model, which we call Sen-SCI-CORD19-BERT, to calculate the similarity scores between sentence pairs to provide a balanced dataset with respect to the different semantic similarity levels, which gives us a total of 32K sentence pairs. Each sentence pair was annotated by five Amazon Mechanical Turk (AMT) crowd workers, where the labels represent different semantic similarity levels between the sentence pairs (i.e. related, somewhat-related, and not-related). After employing a rigorous qualification tasks to verify collected annotations, our final CORD19STS dataset includes 13,710 sentence pairs.
LGJun 6, 2019
On the Effectiveness of Laser Speckle Contrast Imaging and Deep Neural Networks for Detecting Known and Unknown Fingerprint Presentation AttacksHengameh Mirzaalian, Mohamed Hussein, Wael Abd-Almageed
Fingerprint presentation attack detection (FPAD) is becoming an increasingly challenging problem due to the continuous advancement of attack techniques, which generate `realistic-looking' fake fingerprint presentations. Recently, laser speckle contrast imaging (LSCI) has been introduced as a new sensing modality for FPAD. LSCI has the interesting characteristic of capturing the blood flow under the skin surface. Toward studying the importance and effectiveness of LSCI for FPAD, we conduct a comprehensive study using different patch-based deep neural network architectures. Our studied architectures include 2D and 3D convolutional networks as well as a recurrent network using long short-term memory (LSTM) units. The study demonstrates that strong FPAD performance can be achieved using LSCI. We evaluate the different models over a new large dataset. The dataset consists of 3743 bona fide samples, collected from 335 unique subjects, and 218 presentation attack samples, including six different types of attacks. To examine the effect of changing the training and testing sets, we conduct a 3-fold cross validation evaluation. To examine the effect of the presence of an unseen attack, we apply a leave-one-attack out strategy. The FPAD classification results of the networks, which are separately optimized and tuned for the temporal and spatial patch-sizes, indicate that the best performance is achieved by LSTM.
CVMar 18, 2019
QATM: Quality-Aware Template Matching For Deep LearningJiaxin Cheng, Yue Wu, Wael Abd-Almageed et al.
Finding a template in a search image is one of the core problems many computer vision, such as semantic image semantic, image-to-GPS verification \etc. We propose a novel quality-aware template matching method, QATM, which is not only used as a standalone template matching algorithm, but also a trainable layer that can be easily embedded into any deep neural network. Specifically, we assess the quality of a matching pair using soft-ranking among all matching pairs, and thus different matching scenarios such as 1-to-1, 1-to-many, and many-to-many will be all reflected to different values. Our extensive evaluation on classic template matching benchmarks and deep learning tasks demonstrate the effectiveness of QATM. It not only outperforms state-of-the-art template matching methods when used alone, but also largely improves existing deep network solutions.
CVNov 18, 2018
Image-to-GPS Verification Through A Bottom-Up Pattern Matching NetworkJiaxin Cheng, Yue Wu, Wael Abd-Almageed et al.
The image-to-GPS verification problem asks whether a given image is taken at a claimed GPS location. In this paper, we treat it as an image verification problem -- whether a query image is taken at the same place as a reference image retrieved at the claimed GPS location. We make three major contributions: 1) we propose a novel custom bottom-up pattern matching (BUPM) deep neural network solution; 2) we demonstrate that the verification can be directly done by cross-checking a perspective-looking query image and a panorama reference image, and 3) we collect and clean a dataset of 30K pairs query and reference. Our experimental results show that the proposed BUPM solution outperforms the state-of-the-art solutions in terms of both verification and localization.