CVJun 1
Chroma Clues: Leveraging Color Statistics to Detect Synthetic ImagesLea Uhlenbrock, Davide Cozzolino, Christian Riess
The evolution and dissemination of AI-synthesized images is occurring at an unprecedented rate. Image generators are making rapid progress in their goal of perfectly imitating natural images, which also challenges image forensics. In this work, we exploit an underexplored cue in current generative models, namely their weakness to imitate color statistics of natural images. We first show that the LPIPS loss used for training image generators is less sensitive to chrominance than to luminance, which may lead to statistical discrepancies in the colors of synthetic images. Building on this observation, we then introduce six hand-crafted color transformations and a method to learn a task-optimized color transform to statistically expose generated images. These transformations can be used in various ways. First, we define color-sensitive features at pixel-level or patch-level. A simple, interpretable classifier achieves with these features an average generalization accuracy of 93.27% and strong robustness against six types of post-processing. Second, we demonstrate that the transformations exhibit characteristic visual noise patterns in natural and synthetic image areas, which enables an intuitive visual image evaluation. Third, we demonstrate that the transforms can enhance color patterns in generated images for improved multiclass attribution.
CVJul 29, 2022
Forensic License Plate Recognition with Compression-Informed TransformersDenise Moussa, Anatol Maier, Andreas Spruck et al.
Forensic license plate recognition (FLPR) remains an open challenge in legal contexts such as criminal investigations, where unreadable license plates (LPs) need to be deciphered from highly compressed and/or low resolution footage, e.g., from surveillance cameras. In this work, we propose a side-informed Transformer architecture that embeds knowledge on the input compression level to improve recognition under strong compression. We show the effectiveness of Transformers for license plate recognition (LPR) on a low-quality real-world dataset. We also provide a synthetic dataset that includes strongly degraded, illegible LP images and analyze the impact of knowledge embedding on it. The network outperforms existing FLPR methods and standard state-of-the art image recognition models while requiring less parameters. For the severest degraded images, we can improve recognition by up to 8.9 percent points.
CVFeb 2, 2023
Benchmarking Probabilistic Deep Learning Methods for License Plate RecognitionFranziska Schirrmacher, Benedikt Lorch, Anatol Maier et al.
Learning-based algorithms for automated license plate recognition implicitly assume that the training and test data are well aligned. However, this may not be the case under extreme environmental conditions, or in forensic applications where the system cannot be trained for a specific acquisition device. Predictions on such out-of-distribution images have an increased chance of failing. But this failure case is oftentimes hard to recognize for a human operator or an automated system. Hence, in this work we propose to model the prediction uncertainty for license plate recognition explicitly. Such an uncertainty measure allows to detect false predictions, indicating an analyst when not to trust the result of the automated license plate recognition. In this paper, we compare three methods for uncertainty quantification on two architectures. The experiments on synthetic noisy or blurred low-resolution images show that the predictive uncertainty reliably finds wrong predictions. We also show that a multi-task combination of classification and super-resolution improves the recognition performance by 109\% and the detection of wrong predictions by 29 %.
CVMay 19, 2022
Bayesian Convolutional Neural Networks for Limited Data Hyperspectral Remote Sensing Image ClassificationMohammad Joshaghani, Amirabbas Davari, Faezeh Nejati Hatamian et al.
Employing deep neural networks for Hyperspectral remote sensing (HSRS) image classification is a challenging task. HSRS images have high dimensionality and a large number of channels with substantial redundancy between channels. In addition, the training data for classifying HSRS images is limited and the amount of available training data is much smaller compared to other classification tasks. These factors complicate the training process of deep neural networks with many parameters and cause them to not perform well even compared to conventional models. Moreover, convolutional neural networks produce over-confident predictions, which is highly undesirable considering the aforementioned problem. In this work, we use for HSRS image classification a special class of deep neural networks, namely a Bayesian neural network (BNN). To the extent of our knowledge, this is the first time that BNNs are used in HSRS image classification. BNNs inherently provide a measure for uncertainty. We perform extensive experiments on the Pavia Centre, Salinas, and Botswana datasets. We show that a BNN outperforms a standard convolutional neural network (CNN) and an off-the-shelf Random Forest (RF). Further experiments underline that the BNN is more stable and robust to model pruning, and that the uncertainty is higher for samples with higher expected prediction error.
CVMar 1, 2022
Compliance Challenges in Forensic Image Analysis Under the Artificial Intelligence ActBenedikt Lorch, Nicole Scheler, Christian Riess
In many applications of forensic image analysis, state-of-the-art results are nowadays achieved with machine learning methods. However, concerns about their reliability and opaqueness raise the question whether such methods can be used in criminal investigations. So far, this question of legal compliance has hardly been discussed, also because legal regulations for machine learning methods were not defined explicitly. To this end, the European Commission recently proposed the artificial intelligence (AI) act, a regulatory framework for the trustworthy use of AI. Under the draft AI act, high-risk AI systems for use in law enforcement are permitted but subject to compliance with mandatory requirements. In this paper, we review why the use of machine learning in forensic image analysis is classified as high-risk. We then summarize the mandatory requirements for high-risk AI systems and discuss these requirements in light of two forensic applications, license plate recognition and deep fake detection. The goal of this paper is to raise awareness of the upcoming legal requirements and to point out avenues for future research.
SDJul 29, 2022
Towards Unconstrained Audio Splicing Detection and Localization with Neural NetworksDenise Moussa, Germans Hirsch, Christian Riess
Freely available and easy-to-use audio editing tools make it straightforward to perform audio splicing. Convincing forgeries can be created by combining various speech samples from the same person. Detection of such splices is important both in the public sector when considering misinformation, and in a legal context to verify the integrity of evidence. Unfortunately, most existing detection algorithms for audio splicing use handcrafted features and make specific assumptions. However, criminal investigators are often faced with audio samples from unconstrained sources with unknown characteristics, which raises the need for more generally applicable methods. With this work, we aim to take a first step towards unconstrained audio splicing detection to address this need. We simulate various attack scenarios in the form of post-processing operations that may disguise splicing. We propose a Transformer sequence-to-sequence (seq2seq) network for splicing detection and localization. Our extensive evaluation shows that the proposed method outperforms existing dedicated approaches for splicing detection [3, 10] as well as the general-purpose networks EfficientNet [28] and RegNet [25].
CVAug 24, 2023
LORD: Leveraging Open-Set Recognition with Unknown DataTobias Koch, Christian Riess, Thomas Köhler
Handling entirely unknown data is a challenge for any deployed classifier. Classification models are typically trained on a static pre-defined dataset and are kept in the dark for the open unassigned feature space. As a result, they struggle to deal with out-of-distribution data during inference. Addressing this task on the class-level is termed open-set recognition (OSR). However, most OSR methods are inherently limited, as they train closed-set classifiers and only adapt the downstream predictions to OSR. This work presents LORD, a framework to Leverage Open-set Recognition by exploiting unknown Data. LORD explicitly models open space during classifier training and provides a systematic evaluation for such approaches. We identify three model-agnostic training strategies that exploit background data and applied them to well-established classifiers. Due to LORD's extensive evaluation protocol, we consistently demonstrate improved recognition of unknown data. The benchmarks facilitate in-depth analysis across various requirement levels. To mitigate dependency on extensive and costly background datasets, we explore mixup as an off-the-shelf data generation technique. Our experiments highlight mixup's effectiveness as a substitute for background datasets. Lightweight constraints on mixup synthesis further improve OSR performance.
CVSep 27, 2022
3D Rendering Framework for Data Augmentation in Optical Character RecognitionAndreas Spruck, Maximiliane Hawesch, Anatol Maier et al.
In this paper, we propose a data augmentation framework for Optical Character Recognition (OCR). The proposed framework is able to synthesize new viewing angles and illumination scenarios, effectively enriching any available OCR dataset. Its modular structure allows to be modified to match individual user requirements. The framework enables to comfortably scale the enlargement factor of the available dataset. Furthermore, the proposed method is not restricted to single frame OCR but can also be applied to video OCR. We demonstrate the performance of our framework by augmenting a 15% subset of the common Brno Mobile OCR dataset. Our proposed framework is capable of leveraging the performance of OCR applications especially for small datasets. Applying the proposed method, improvements of up to 2.79 percentage points in terms of Character Error Rate (CER), and up to 7.88 percentage points in terms of Word Error Rate (WER) are achieved on the subset. Especially the recognition of challenging text lines can be improved. The CER may be decreased by up to 14.92 percentage points and the WER by up to 18.19 percentage points for this class. Moreover, we are able to achieve smaller error rates when training on the 15% subset augmented with the proposed method than on the original non-augmented full dataset.
CVJun 21, 2022
Deep Metric Color Embeddings for Splicing Localization in Severely Degraded ImagesBenjamin Hadwiger, Christian Riess
One common task in image forensics is to detect spliced images, where multiple source images are composed to one output image. Most of the currently best performing splicing detectors leverage high-frequency artifacts. However, after an image underwent strong compression, most of the high frequency artifacts are not available anymore. In this work, we explore an alternative approach to splicing detection, which is potentially better suited for images in-the-wild, subject to strong compression and downsampling. Our proposal is to model the color formation of an image. The color formation largely depends on variations at the scale of scene objects, and is hence much less dependent on high-frequency artifacts. We learn a deep metric space that is on one hand sensitive to illumination color and camera white-point estimation, but on the other hand insensitive to variations in object color. Large distances in the embedding space indicate that two image regions either stem from different scenes or different cameras. In our evaluation, we show that the proposed embedding space outperforms the state of the art on images that have been subject to strong compression and downsampling. We confirm in two further experiments the dual nature of the metric space, namely to both characterize the acquisition camera and the scene illuminant color. As such, this work resides at the intersection of physics-based and statistical forensics with benefits from both sides.
LGMay 30, 2022
Exploring the Open World Using Incremental Extreme Value MachinesTobias Koch, Felix Liebezeit, Christian Riess et al.
Dynamic environments require adaptive applications. One particular machine learning problem in dynamic environments is open world recognition. It characterizes a continuously changing domain where only some classes are seen in one batch of the training data and such batches can only be learned incrementally. Open world recognition is a demanding task that is, to the best of our knowledge, addressed by only a few methods. This work introduces a modification of the widely known Extreme Value Machine (EVM) to enable open world recognition. Our proposed method extends the EVM with a partial model fitting function by neglecting unaffected space during an update. This reduces the training time by a factor of 28. In addition, we provide a modified model reduction using weighted maximum K-set cover to strictly bound the model complexity and reduce the computational effort by a factor of 3.5 from 2.1 s to 0.6 s. In our experiments, we rigorously evaluate openness with two novel evaluation protocols. The proposed method achieves superior accuracy of about 12 % and computational efficiency in the tasks of image classification and face recognition.
CVSep 28, 2022
Synthesizing Annotated Image and Video Data Using a Rendering-Based Pipeline for Improved License Plate RecognitionAndreas Spruck, Maximilane Gruber, Anatol Maier et al.
An insufficient number of training samples is a common problem in neural network applications. While data augmentation methods require at least a minimum number of samples, we propose a novel, rendering-based pipeline for synthesizing annotated data sets. Our method does not modify existing samples but synthesizes entirely new samples. The proposed rendering-based pipeline is capable of generating and annotating synthetic and partly-real image and video data in a fully automatic procedure. Moreover, the pipeline can aid the acquisition of real data. The proposed pipeline is based on a rendering process. This process generates synthetic data. Partly-real data bring the synthetic sequences closer to reality by incorporating real cameras during the acquisition process. The benefits of the proposed data generation pipeline, especially for machine learning scenarios with limited available training data, are demonstrated by an extensive experimental validation in the context of automatic license plate recognition. The experiments demonstrate a significant reduction of the character error rate and miss rate from 73.74% and 100% to 14.11% and 41.27% respectively, compared to an OCR algorithm trained on a real data set solely. These improvements are achieved by training the algorithm on synthesized data solely. When additionally incorporating real data, the error rates can be decreased further. Thereby, the character error rate and miss rate can be reduced to 11.90% and 39.88% respectively. All data used during the experiments as well as the proposed rendering-based pipeline for the automated data generation is made publicly available under (URL will be revealed upon publication).
CVAug 20, 2024
Trustworthy Compression? Impact of AI-based Codecs on Biometrics for Law EnforcementSandra Bergmann, Denise Moussa, Christian Riess
Image-based biometrics can aid law enforcement in various aspects, for example in iris, fingerprint and soft-biometric recognition. A critical precondition for recognition is the availability of sufficient biometric information in images. It is visually apparent that strong JPEG compression removes such details. However, latest AI-based image compression seemingly preserves many image details even for very strong compression factors. Yet, these perceived details are not necessarily grounded in measurements, which raises the question whether these images can still be used for biometric recognition. In this work, we investigate how AI compression impacts iris, fingerprint and soft-biometric (fabrics and tattoo) images. We also investigate the recognition performance for iris and fingerprint images after AI compression. It turns out that iris recognition can be strongly affected, while fingerprint recognition is quite robust. The loss of detail is qualitatively best seen in fabrics and tattoos images. Overall, our results show that AI-compression still permits many biometric tasks, but attention to strong compression factors in sensitive tasks is advisable.
SDMay 3, 2024
EnvId: A Metric Learning Approach for Forensic Few-Shot Identification of Unseen EnvironmentsDenise Moussa, Germans Hirsch, Christian Riess
Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of a recorded audio to its recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide supervised classification tools for closed-set recording environment identification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, supervised learning techniques are not applicable without retraining a classifier on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality. In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining by modeling the task as a few-shot classification problem. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, out-of-distribution reverberation characteristics or recording position mismatches.
CVApr 4, 2025
Three Forensic Cues for JPEG AI ImagesSandra Bergmann, Fabian Brand, Christian Riess
The JPEG standard was vastly successful. Currently, the first AI-based compression method ``JPEG AI'' will be standardized. JPEG AI brings remarkable benefits. JPEG AI images exhibit impressive image quality at bitrates that are an order of magnitude lower than images compressed with traditional JPEG. However, forensic analysis of JPEG AI has to be completely re-thought: forensic tools for traditional JPEG do not transfer to JPEG AI, and artifacts from JPEG AI are easily confused with artifacts from artificially generated images (``DeepFakes''). This creates a need for novel forensic approaches to detection and distinction of JPEG AI images. In this work, we make a first step towards a forensic JPEG AI toolset. We propose three cues for forensic algorithms for JPEG AI. These algorithms address three forensic questions: first, we show that the JPEG AI preprocessing introduces correlations in the color channels that do not occur in uncompressed images. Second, we show that repeated compression of JPEG AI images leads to diminishing distortion differences. This can be used to detect recompression, in a spirit similar to some classic JPEG forensics methods. Third, we show that the quantization of JPEG AI images in the latent space can be used to distinguish real images with JPEG AI compression from synthetically generated images. The proposed methods are interpretable for a forensic analyst, and we hope that they inspire further research in the forensics of AI-compressed images.
LGNov 12, 2025
Sure! Here's a short and concise title for your paper: "Contamination in Generated Text Detection Benchmarks"Philipp Dingfelder, Christian Riess
Large language models are increasingly used for many applications. To prevent illicit use, it is desirable to be able to detect AI-generated text. Training and evaluation of such detectors critically depend on suitable benchmark datasets. Several groups took on the tedious work of collecting, curating, and publishing large and diverse datasets for this task. However, it remains an open challenge to ensure high quality in all relevant aspects of such a dataset. For example, the DetectRL benchmark exhibits relatively simple patterns of AI-generation in 98.5% of the Claude-LLM data. These patterns may include introductory words such as "Sure! Here is the academic article abstract:", or instances where the LLM rejects the prompted task. In this work, we demonstrate that detectors trained on such data use such patterns as shortcuts, which facilitates spoofing attacks on the trained detectors. We consequently reprocessed the DetectRL dataset with several cleansing operations. Experiments show that such data cleansing makes direct attacks more difficult. The reprocessed dataset is publicly available.
LGJul 2, 2025
BranchNet: A Neuro-Symbolic Learning Framework for Structured Multi-Class ClassificationDalia Rodríguez-Salas, Christian Riess
We introduce BranchNet, a neuro-symbolic learning framework that transforms decision tree ensembles into sparse, partially connected neural networks. Each branch, defined as a decision path from root to a parent of leaves, is mapped to a hidden neuron, preserving symbolic structure while enabling gradient-based optimization. The resulting models are compact, interpretable, and require no manual architecture tuning. Evaluated on a suite of structured multi-class classification benchmarks, BranchNet consistently outperforms XGBoost in accuracy, with statistically significant gains. We detail the architecture, training procedure, and sparsity dynamics, and discuss the model's strengths in symbolic interpretability as well as its current limitations, particularly on binary tasks where further adaptive calibration may be beneficial.
CVFeb 9, 2021
Deep learning architectural designs for super-resolution of noisy imagesAngel Villar-Corrales, Franziska Schirrmacher, Christian Riess
Recent advances in deep learning have led to significant improvements in single image super-resolution (SR) research. However, due to the amplification of noise during the upsampling steps, state-of-the-art methods often fail at reconstructing high-resolution images from noisy versions of their low-resolution counterparts. However, this is especially important for images from unknown cameras with unseen types of image degradation. In this work, we propose to jointly perform denoising and super-resolution. To this end, we investigate two architectural designs: "in-network" combines both tasks at feature level, while "pre-network" first performs denoising and then super-resolution. Our experiments show that both variants have specific advantages: The in-network design obtains the strongest results when the type of image corruption is aligned in the training and testing dataset, for any choice of denoiser. The pre-network design exhibits superior performance on unseen types of image corruption, which is a pathological failure case of existing super-resolution models. We hope that these findings help to enable super-resolution also in less constrained scenarios where source camera or imaging conditions are not well controlled. Source code and pretrained models are available at https://github.com/ angelvillar96/super-resolution-noisy-images.
LGJan 8, 2021
Synthetic Glacier SAR Image Generation from Arbitrary Masks Using Pix2Pix AlgorithmRosanna Dietrich-Sussner, Amirabbas Davari, Thorsten Seehaus et al.
Supervised machine learning requires a large amount of labeled data to achieve proper test results. However, generating accurately labeled segmentation maps on remote sensing imagery, including images from synthetic aperture radar (SAR), is tedious and highly subjective. In this work, we propose to alleviate the issue of limited training data by generating synthetic SAR images with the pix2pix algorithm. This algorithm uses conditional Generative Adversarial Networks (cGANs) to generate an artificial image while preserving the structure of the input. In our case, the input is a segmentation mask, from which a corresponding synthetic SAR image is generated. We present different models, perform a comparative study and demonstrate that this approach synthesizes convincing glaciers in SAR images with promising qualitative and quantitative results.
CVNov 4, 2020
The Forchheim Image Database for Camera Identification in the WildBenjamin Hadwiger, Christian Riess
Image provenance can represent crucial knowledge in criminal investigation and journalistic fact checking. In the last two decades, numerous algorithms have been proposed for obtaining information on the source camera and distribution history of an image. For a fair ranking of these techniques, it is important to rigorously assess their performance on practically relevant test cases. To this end, a number of datasets have been proposed. However, we argue that there is a gap in existing databases: to our knowledge, there is currently no dataset that simultaneously satisfies two goals, namely a) to cleanly separate scene content and forensic traces, and b) to support realistic post-processing like social media recompression. In this work, we propose the Forchheim Image Database (FODB) to close this gap. It consists of more than 23,000 images of 143 scenes by 27 smartphone cameras, and it allows to cleanly separate image content from forensic artifacts. Each image is provided in 6 different qualities: the original camera-native version, and five copies from social networks. We demonstrate the usefulness of FODB in an evaluation of methods for camera identification. We report three findings. First, the recently proposed general-purpose EfficientNet remarkably outperforms several dedicated forensic CNNs both on clean and compressed images. Second, classifiers obtain a performance boost even on unknown post-processing after augmentation by artificial degradations. Third, FODB's clean separation of scene content and forensic traces imposes important, rigorous boundary conditions for algorithm benchmarking.
CVOct 27, 2020
Reconstruction of Voxels with Position- and Angle-Dependent WeightingsLina Felsner, Tobias Würfl, Christopher Syben et al.
The reconstruction problem of voxels with individual weightings can be modeled a position- and angle- dependent function in the forward-projection. This changes the system matrix and prohibits to use standard filtered backprojection. In this work we first formulate this reconstruction problem in terms of a system matrix and weighting part. We compute the pseudoinverse and show that the solution is rank-deficient and hence very ill posed. This is a fundamental limitation for reconstruction. We then derive an iterative solution and experimentally show its uperiority to any closed-form solution.
LGJul 28, 2020
Toward Reliable Models for Authenticating Multimedia Content: Detecting Resampling Artifacts With Bayesian Neural NetworksAnatol Maier, Benedikt Lorch, Christian Riess
In multimedia forensics, learning-based methods provide state-of-the-art performance in determining origin and authenticity of images and videos. However, most existing methods are challenged by out-of-distribution data, i.e., with characteristics that are not covered in the training set. This makes it difficult to know when to trust a model, particularly for practitioners with limited technical background. In this work, we make a first step toward redesigning forensic algorithms with a strong focus on reliability. To this end, we propose to use Bayesian neural networks (BNN), which combine the power of deep neural networks with the rigorous probabilistic formulation of a Bayesian framework. Instead of providing a point estimate like standard neural networks, BNNs provide distributions that express both the estimate and also an uncertainty range. We demonstrate the usefulness of this framework on a classical forensic task: resampling detection. The BNN yields state-of-the-art detection performance, plus excellent capabilities for detecting out-of-distribution samples. This is demonstrated for three pathologic issues in resampling detection, namely unseen resampling factors, unseen JPEG compression, and unseen resampling algorithms. We hope that this proposal spurs further research toward reliability in multimedia forensics.
IVNov 12, 2019
Merging-ISP: Multi-Exposure High Dynamic Range Image Signal ProcessingPrashant Chaudhari, Franziska Schirrmacher, Andreas Maier et al.
High dynamic range (HDR) imaging combines multiple images with different exposure times into a single high-quality image. The image signal processing pipeline (ISP) is a core component in digital cameras to perform these operations. It includes demosaicing of raw color filter array (CFA) data at different exposure times, alignment of the exposures, conversion to HDR domain, and exposure merging into an HDR image. Traditionally, such pipelines cascade algorithms that address these individual subtasks. However, cascaded designs suffer from error propagation, since simply combining multiple steps is not necessarily optimal for the entire imaging task. This paper proposes a multi-exposure HDR image signal processing pipeline (Merging-ISP) to jointly solve all these subtasks. Our pipeline is modeled by a deep neural network architecture. As such, it is end-to-end trainable, circumvents the use of hand-crafted and potentially complex algorithms, and mitigates error propagation. Merging-ISP enables direct reconstructions of HDR images of dynamic scenes from multiple raw CFA images with different exposures. We compare Merging-ISP against several state-of-the-art cascaded pipelines. The proposed method provides HDR reconstructions of high perceptual quality and it quantitatively outperforms competing ISPs by more than 1 dB in terms of PSNR.
CVJan 25, 2019
FaceForensics++: Learning to Detect Manipulated Facial ImagesAndreas Rössler, Davide Cozzolino, Luisa Verdoliva et al.
The rapid progress in synthetic image generation and manipulation has now come to a point where it raises significant concerns for the implications towards society. At best, this leads to a loss of trust in digital content, but could potentially cause further harm by spreading false information or fake news. This paper examines the realism of state-of-the-art image manipulations, and how difficult it is to detect them, either automatically or by humans. To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection. In particular, the benchmark is based on DeepFakes, Face2Face, FaceSwap and NeuralTextures as prominent representatives for facial manipulations at random compression level and size. The benchmark is publicly available and contains a hidden test set as well as a database of over 1.8 million manipulated images. This dataset is over an order of magnitude larger than comparable, publicly available, forgery datasets. Based on this data, we performed a thorough analysis of data-driven forgery detectors. We show that the use of additional domainspecific knowledge improves forgery detection to unprecedented accuracy, even in the presence of strong compression, and clearly outperforms human observers.
CVDec 6, 2018
ForensicTransfer: Weakly-supervised Domain Adaptation for Forgery DetectionDavide Cozzolino, Justus Thies, Andreas Rössler et al.
Distinguishing manipulated from real images is becoming increasingly difficult as new sophisticated image forgery approaches come out by the day. Naive classification approaches based on Convolutional Neural Networks (CNNs) show excellent performance in detecting image manipulations when they are trained on a specific forgery method. However, on examples from unseen manipulation approaches, their performance drops significantly. To address this limitation in transferability, we introduce Forensic-Transfer (FT). We devise a learning-based forensic detector which adapts well to new domains, i.e., novel manipulation methods and can handle scenarios where only a handful of fake examples are available during training. To this end, we learn a forensic embedding based on a novel autoencoder-based architecture that can be used to distinguish between real and fake imagery. The learned embedding acts as a form of anomaly detector; namely, an image manipulated from an unseen method will be detected as fake provided it maps sufficiently far away from the cluster of real images. Comparing to prior works, FT shows significant improvements in transferability, which we demonstrate in a series of experiments on cutting-edge benchmarks. For instance, on unseen examples, we achieve up to 85% in terms of accuracy, and with only a handful of seen examples, our performance already reaches around 95%.
MED-PHNov 11, 2018
A 3-D Projection Model for X-ray Dark-field ImagingShiyang Hu, Lina Felsner, Andreas Maier et al.
Talbot-Lau X-ray phase-contrast imaging is a novel imaging modality, which provides not only an X-ray absorption image, but also additionally a differential phase image and a dark-field image. The dark-field image is related to small angle scattering and has an interesting property when canning oriented structures: the recorded signal depends on the relative orientation of the structure in the imaging system. Exactly this property allows to draw conclusions about the orientation and to reconstruct the structure. However, the reconstruction is a complex, non-trivial challenge. A lot of research was conducted towards this goal in the last years and several reconstruction algorithms were proposed. A key step of the reconstruction algorithm is the inversion of a forward projection model. Up until now, only 2-D projection models are available, with effectively limit the scanning trajectory to a 2-D plane. To obtain true 3-D information, this limitation requires to combine several 2-D scans, which leads to quite complex, impractical acquisitions schemes. Furthermore, it is not possible with these models to use 3-D trajectories that might allow simpler protocols, like for example a helical trajectory. To address these limitations, we propose in this work a very general 3-D projection model. Our projection model defines the dark-field signal dependent on an arbitrarily chosen ray and sensitivity direction. We derive the projection model under the assumption that the observed scatter distribution has a Gaussian shape. We theoretically show the consistency of our model with more constrained existing 2-D models. Furthermore, we experimentally show the compatibility of our model with dark-field measurements of two matchsticks. We believe that this 3-D projection model is an important step towards more flexible trajectories and imaging protocols that are much better applicable in practice.
CVOct 12, 2018
A Gentle Introduction to Deep Learning in Medical Image ProcessingAndreas Maier, Christopher Syben, Tobias Lasser et al.
This paper tries to give a gentle introduction to deep learning in medical image processing, proceeding from theoretical foundations to applications. We first discuss general reasons for the popularity of deep learning, including several major breakthroughs in computer science. Next, we start reviewing the fundamental basics of the perceptron and neural networks, along with some fundamental theory that is often omitted. Doing so allows us to understand the reasons for the rise of deep learning in many application domains. Obviously medical image processing is one of these areas which has been largely affected by this rapid progress, in particular in image detection and recognition, image segmentation, image registration, and computer-aided diagnosis. There are also recent trends in physical simulation, modelling, and reconstruction that have led to astonishing results. Yet, some of these approaches neglect prior knowledge and hence bear the risk of producing implausible results. These apparent weaknesses highlight current limitations of deep learning. However, we also briefly discuss promising approaches that might be able to resolve these problems in the future.
CVSep 17, 2018
Toward Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real DataThomas Köhler, Michel Bätz, Farzad Naderi et al.
Capturing ground truth data to benchmark super-resolution (SR) is challenging. Therefore, current quantitative studies are mainly evaluated on simulated data artificially sampled from ground truth images. We argue that such evaluations overestimate the actual performance of SR methods compared to their behavior on real images. Toward bridging this simulated-to-real gap, we introduce the Super-Resolution Erlangen (SupER) database, the first comprehensive laboratory SR database of all-real acquisitions with pixel-wise ground truth. It consists of more than 80k images of 14 scenes combining different facets: CMOS sensor noise, real sampling at four resolution levels, nine scene motion types, two photometric conditions, and lossy video coding at five levels. As such, the database exceeds existing benchmarks by an order of magnitude in quality and quantity. This paper also benchmarks 19 popular single-image and multi-frame algorithms on our data. The benchmark comprises a quantitative study by exploiting ground truth data and qualitative evaluations in a large-scale observer study. We also rigorously investigate agreements between both evaluations from a statistical perspective. One interesting result is that top-performing methods on simulated data may be surpassed by others on real data. Our insights can spur further algorithm development, and the publicy available dataset can foster future evaluations.
CVJul 8, 2018
Automatic Classification of Defective Photovoltaic Module Cells in Electroluminescence ImagesSergiu Deitsch, Vincent Christlein, Stephan Berger et al.
Electroluminescence (EL) imaging is a useful modality for the inspection of photovoltaic (PV) modules. EL images provide high spatial resolution, which makes it possible to detect even finest defects on the surface of PV modules. However, the analysis of EL images is typically a manual process that is expensive, time-consuming, and requires expert knowledge of many different types of defects. In this work, we investigate two approaches for automatic detection of such defects in a single image of a PV cell. The approaches differ in their hardware requirements, which are dictated by their respective application scenarios. The more hardware-efficient approach is based on hand-crafted features that are classified in a Support Vector Machine (SVM). To obtain a strong performance, we investigate and compare various processing variants. The more hardware-demanding approach uses an end-to-end deep Convolutional Neural Network (CNN) that runs on a Graphics Processing Unit (GPU). Both approaches are trained on 1,968 cells extracted from high resolution EL intensity images of mono- and polycrystalline PV modules. The CNN is more accurate, and reaches an average accuracy of 88.42%. The SVM achieves a slightly lower average accuracy of 82.44%, but can run on arbitrary hardware. Both automated approaches make continuous, highly accurate monitoring of PV cells feasible.
CVJun 18, 2018
Segmentation of Photovoltaic Module Cells in Uncalibrated Electroluminescence ImagesSergiu Deitsch, Claudia Buerhop-Lutz, Evgenii Sovetkin et al.
High resolution electroluminescence (EL) images captured in the infrared spectrum allow to visually and non-destructively inspect the quality of photovoltaic (PV) modules. Currently, however, such a visual inspection requires trained experts to discern different kinds of defects, which is time-consuming and expensive. Automated segmentation of cells is therefore a key step in automating the visual inspection workflow. In this work, we propose a robust automated segmentation method for extraction of individual solar cells from EL images of PV modules. This enables controlled studies on large amounts of data to understanding the effects of module degradation over time-a process not yet fully understood. The proposed method infers in several steps a high-level solar module representation from low-level edge features. An important step in the algorithm is to formulate the segmentation problem in terms of lens calibration by exploiting the plumbline constraint. We evaluate our method on a dataset of various solar modules types containing a total of 408 solar cells with various defects. Our method robustly solves this task with a median weighted Jaccard index of 94.47% and an $F_1$ score of 97.62%, both indicating a very high similarity between automatically segmented and ground truth solar cell masks.
CVApr 6, 2018
Adaptive Quantile Sparse Image (AQuaSI) Prior for Inverse Imaging ProblemsFranziska Schirrmacher, Thomas Köhler, Christian Riess
Inverse problems play a central role for many classical computer vision and image processing tasks. Many inverse problems are ill-posed, and hence require a prior to regularize the solution space. However, many of the existing priors, like total variation, are based on ad-hoc assumptions that have difficulties to represent the actual distribution of natural images. Thus, a key challenge in research on image processing is to find better suited priors to represent natural images. In this work, we propose the Adaptive Quantile Sparse Image (AQuaSI) prior. It is based on a quantile filter, can be used as a joint filter on guidance data, and be readily plugged into a wide range of numerical optimization algorithms. We demonstrate the efficacy of the proposed prior in joint RGB/depth upsampling, on RGB/NIR image restoration, and in a comparison with related regularization by denoising approaches.
CVMar 24, 2018
FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human FacesAndreas Rössler, Davide Cozzolino, Luisa Verdoliva et al.
With recent advances in computer vision and graphics, it is now possible to generate videos with extremely realistic synthetic faces, even in real time. Countless applications are possible, some of which raise a legitimate alarm, calling for reliable detectors of fake videos. In fact, distinguishing between original and manipulated video can be a challenge for humans and computers alike, especially when the videos are compressed or have low resolution, as it often happens on social networks. Research on the detection of face manipulations has been seriously hampered by the lack of adequate datasets. To this end, we introduce a novel face manipulation dataset of about half a million edited images (from over 1000 videos). The manipulations have been generated with a state-of-the-art face editing approach. It exceeds all existing video manipulation datasets by at least an order of magnitude. Using our new dataset, we introduce benchmarks for classical image forensic tasks, including classification and segmentation, considering videos compressed at various quality levels. In addition, we introduce a benchmark evaluation for creating indistinguishable forgeries with known ground truth; for instance with generative refinement models.
CVJan 29, 2018
Hyper-Hue and EMAP on Hyperspectral Images for Supervised Layer Decomposition of Old Master DrawingsAmirAbbas Davari, Nikolaos Sakaltras, Armin Haeberle et al.
Old master drawings were mostly created step by step in several layers using different materials. To art historians and restorers, examination of these layers brings various insights into the artistic work process and helps to answer questions about the object, its attribution and its authenticity. However, these layers typically overlap and are oftentimes difficult to differentiate with the unaided eye. For example, a common layer combination is red chalk under ink. In this work, we propose an image processing pipeline that operates on hyperspectral images to separate such layers. Using this pipeline, we show that hyperspectral images enable better layer separation than RGB images, and that spectral focus stacking aids the layer separation. In particular, we propose to use two descriptors in hyperspectral historical document analysis, namely hyper-hue and extended multi-attribute profile (EMAP). Our comparative results with other features underline the efficacy of the three proposed improvements.
CVDec 13, 2017
GMM-Based Synthetic Samples for Classification of Hyperspectral Images With Limited Training DataAmirAbbas Davari, Erchan Aptoula, Berrin Yanikoglu et al.
The amount of training data that is required to train a classifier scales with the dimensionality of the feature data. In hyperspectral remote sensing, feature data can potentially become very high dimensional. However, the amount of training data is oftentimes limited. Thus, one of the core challenges in hyperspectral remote sensing is how to perform multi-class classification using only relatively few training data points. In this work, we address this issue by enriching the feature matrix with synthetically generated sample points. This synthetic data is sampled from a GMM fitted to each class of the limited training data. Although, the true distribution of features may not be perfectly modeled by the fitted GMM, we demonstrate that a moderate augmentation by these synthetic samples can effectively replace a part of the missing training samples. We show the efficacy of the proposed approach on two hyperspectral datasets. The median gain in classification performance is $5\%$. It is also encouraging that this performance gain is remarkably stable for large variations in the number of added samples, which makes it much easier to apply this method to real-world applications.
CVDec 12, 2017
Image Registration for the Alignment of Digitized Historical DocumentsAmirAbbas Davari, Tobias Lindenberger, Armin Häberle et al.
In this work, we conducted a survey on different registration algorithms and investigated their suitability for hyperspectral historical image registration applications. After the evaluation of different algorithms, we choose an intensity based registration algorithm with a curved transformation model. For the transformation model, we select cubic B-splines since they should be capable to cope with all non-rigid deformations in our hyperspectral images. From a number of similarity measures, we found that residual complexity and localized mutual information are well suited for the task at hand. In our evaluation, both measures show an acceptable performance in handling all difficulties, e.g., capture range, non-stationary and spatially varying intensity distortions or multi-modality that occur in our application.
CVDec 10, 2017
Sketch Layer Separation in Multi-Spectral Historical Document ImagesAmirAbbas Davari, Armin Häberle, Vincent Christlein et al.
High-resolution imaging has delivered new prospects for detecting the material composition and structure of cultural treasures. Despite the various techniques for analysis, a significant diagnostic gap remained in the range of available research capabilities for works on paper. Old master drawings were mostly composed in a multi-step manner with various materials. This resulted in the overlapping of different layers which made the subjacent strata difficult to differentiate. The separation of stratified layers using imaging methods could provide insights into the artistic work processes and help answer questions about the object, its attribution, or in identifying forgeries. The pattern recognition procedure was tested with mock replicas to achieve the separation and the capability of displaying concealed red chalk under ink. In contrast to RGB-sensor based imaging, the multi- or hyperspectral technology allows accurate layer separation by recording the characteristic signatures of the material's reflectance. The risk of damage to the artworks as a result of the examination can be reduced by using combinations of defined spectra for lightning and image capturing. By guaranteeing the maximum level of readability, our results suggest that the technique can be applied to a broader range of objects and assist in diagnostic research into cultural treasures in the future.
CVSep 8, 2017
Benchmarking Super-Resolution Algorithms on Real DataThomas Köhler, Michel Bätz, Farzad Naderi et al.
Over the past decades, various super-resolution (SR) techniques have been developed to enhance the spatial resolution of digital images. Despite the great number of methodical contributions, there is still a lack of comparative validations of SR under practical conditions, as capturing real ground truth data is a challenging task. Therefore, current studies are either evaluated 1) on simulated data or 2) on real data without a pixel-wise ground truth. To facilitate comprehensive studies, this paper introduces the publicly available Super-Resolution Erlangen (SupER) database that includes real low-resolution images along with high-resolution ground truth data. Our database comprises image sequences with more than 20k images captured from 14 scenes under various types of motions and photometric conditions. The datasets cover four spatial resolution levels using camera hardware binning. With this database, we benchmark 15 single-image and multi-frame SR algorithms. Our experiments quantitatively analyze SR accuracy and robustness under realistic conditions including independent object and camera motion or photometric variations.
CVAug 17, 2012
An Evaluation of Popular Copy-Move Forgery Detection ApproachesVincent Christlein, Christian Riess, Johannes Jordan et al.
A copy-move forgery is created by copying and pasting content within the same image, and potentially post-processing it. In recent years, the detection of copy-move forgeries has become one of the most actively researched topics in blind image forensics. A considerable number of different algorithms have been proposed focusing on different types of postprocessed copies. In this paper, we aim to answer which copy-move forgery detection algorithms and processing steps (e.g., matching, filtering, outlier detection, affine transformation estimation) perform best in various postprocessing scenarios. The focus of our analysis is to evaluate the performance of previously proposed feature sets. We achieve this by casting existing algorithms in a common pipeline. In this paper, we examined the 15 most prominent feature sets. We analyzed the detection performance on a per-image basis and on a per-pixel basis. We created a challenging real-world copy-move dataset, and a software framework for systematic image manipulation. Experiments show, that the keypoint-based features SIFT and SURF, as well as the block-based DCT, DWT, KPCA, PCA and Zernike features perform very well. These feature sets exhibit the best robustness against various noise sources and downsampling, while reliably identifying the copied regions.