Glenn Van Wallendael

CV
h-index25
16papers
123citations
Novelty43%
AI Score51

16 Papers

CVJul 13, 2023Code
GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer

Deressa Wodajo Deressa, Hannes Mareen, Peter Lambert et al.

Deepfakes have raised significant concerns due to their potential to spread false information and compromise digital media integrity. Current deepfake detection models often struggle to generalize across a diverse range of deepfake generation techniques and video content. In this work, we propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection. Our model combines ConvNeXt and Swin Transformer models for feature extraction, and it utilizes Autoencoder and Variational Autoencoder to learn from the latent data distribution. By learning from the visual artifacts and latent data distribution, GenConViT achieves improved performance in detecting a wide range of deepfake videos. The model is trained and evaluated on DFDC, FF++, TM, DeepfakeTIMIT, and Celeb-DF (v$2$) datasets. The proposed GenConViT model demonstrates strong performance in deepfake video detection, achieving high accuracy across the tested datasets. While our model shows promising results in deepfake video detection by leveraging visual and latent features, we demonstrate that further work is needed to improve its generalizability, i.e., when encountering out-of-distribution data. Our model provides an effective solution for identifying a wide range of fake videos while preserving media integrity. The open-source code for GenConViT is available at https://github.com/erprogs/GenConViT.

60.7LGJun 4
Proper Scoring Rules for Right-Censored Survival Data

Jef Jonkers, Glenn Van Wallendael, Luc Duchateau et al.

Proper scoring rules provide a rigorous theoretical basis for the training and evaluation of probabilistic forecasts. However, in the presence of right censoring, the event time is only partially observed, rendering conventional scoring rules inapplicable in their standard form. We propose a framework for proper scoring of right-censored survival outcomes based on a simple idea: first, map the predictive distribution through the censoring mechanism, then apply the underlying proper score on the induced observed-data law. This yields localized scores for fixed censoring times and marginalized scores when the censoring time is random or only partially observed. The resulting construction recovers familiar right-censored likelihood and IPCW-type criteria within a coherent framework, while also yielding right-censored versions of the CRPS, pinball loss, Brier score, and energy score. We show that the marginalized score is proper under conditional independent censoring and strictly proper on the identifiable region. The same principle also leads to censored engression, a sample-based learning objective for multivariate right-censored survival modeling. In experiments, our scores correctly rank the oracle forecast across several censoring regimes, whereas forecast-dependent plug-in weighted scores can exhibit ranking reversals. Censored engression likewise substantially improves over naive training on censored outcomes.

CVJul 16, 2024Code
TGIF: Text-Guided Inpainting Forgery Dataset

Hannes Mareen, Dimitrios Karageorgiou, Glenn Van Wallendael et al.

Digital image manipulation has become increasingly accessible and realistic with the advent of generative AI technologies. Recent developments allow for text-guided inpainting, making sophisticated image edits possible with minimal effort. This poses new challenges for digital media forensics. For example, diffusion model-based approaches could either splice the inpainted region into the original image, or regenerate the entire image. In the latter case, traditional image forgery localization (IFL) methods typically fail. This paper introduces the Text-Guided Inpainting Forgery (TGIF) dataset, a comprehensive collection of images designed to support the training and evaluation of image forgery localization and synthetic image detection (SID) methods. The TGIF dataset includes approximately 75k forged images, originating from popular open-source and commercial methods, namely SD2, SDXL, and Adobe Firefly. We benchmark several state-of-the-art IFL and SID methods on TGIF. Whereas traditional IFL methods can detect spliced images, they fail to detect regenerated inpainted images. Moreover, traditional SID may detect the regenerated inpainted images to be fake, but cannot localize the inpainted area. Finally, both IFL and SID methods fail when exposed to stronger compression, while they are less robust to modern compression algorithms, such as WEBP. In conclusion, this work demonstrates the inefficiency of state-of-the-art detectors on local manipulations performed by modern generative approaches, and aspires to help with the development of more capable IFL and SID methods. The dataset and code can be downloaded at https://github.com/IDLabMedia/tgif-dataset.

66.4CVMar 30Code
TGIF2: Extended Text-Guided Inpainting Forgery Dataset & Benchmark

Hannes Mareen, Dimitrios Karageorgiou, Paschalis Giakoumoglou et al.

Generative AI has made text-guided inpainting a powerful image editing tool, but at the same time a growing challenge for media forensics. Existing benchmarks, including our text-guided inpainting forgery (TGIF) dataset, show that image forgery localization (IFL) methods can localize manipulations in spliced images but struggle not in fully regenerated (FR) images, while synthetic image detection (SID) methods can detect fully regenerated images but cannot perform localization. With new generative inpainting models emerging and the open problem of localization in FR images remaining, updated datasets and benchmarks are needed. We introduce TGIF2, an extended version of TGIF, that captures recent advances in text-guided inpainting and enables a deeper analysis of forensic robustness. TGIF2 augments the original dataset with edits generated by FLUX.1 models, as well as with random non-semantic masks. Using the TGIF2 dataset, we conduct a forensic evaluation spanning IFL and SID, including fine-tuning IFL methods on FR images and generative super-resolution attacks. Our experiments show that both IFL and SID methods degrade on FLUX.1 manipulations, highlighting limited generalization. Additionally, while fine-tuning improves localization on FR images, evaluation with random non-semantic masks reveals object bias. Furthermore, generative super-resolution significantly weakens forensic traces, demonstrating that common image enhancement operations can undermine current forensic pipelines. In summary, TGIF2 provides an updated dataset and benchmark, which enables new insights into the challenges posed by modern inpainting and AI-based image enhancements. TGIF2 is available at https://github.com/IDLabMedia/tgif-dataset.

CVOct 5, 2022
Comprint: Image Forgery Detection and Localization using Compression Fingerprints

Hannes Mareen, Dante Vanden Bussche, Fabrizio Guillaro et al.

Manipulation tools that realistically edit images are widely available, making it easy for anyone to create and spread misinformation. In an attempt to fight fake news, forgery detection and localization methods were designed. However, existing methods struggle to accurately reveal manipulations found in images on the internet, i.e., in the wild. That is because the type of forgery is typically unknown, in addition to the tampering traces being damaged by recompression. This paper presents Comprint, a novel forgery detection and localization method based on the compression fingerprint or comprint. It is trained on pristine data only, providing generalization to detect different types of manipulation. Additionally, we propose a fusion of Comprint with the state-of-the-art Noiseprint, which utilizes a complementary camera model fingerprint. We carry out an extensive experimental analysis and demonstrate that Comprint has a high level of accuracy on five evaluation datasets that represent a wide range of manipulation types, mimicking in-the-wild circumstances. Most notably, the proposed fusion significantly outperforms state-of-the-art reference methods. As such, Comprint and the fusion Comprint+Noiseprint represent a promising forensics tool to analyze in-the-wild tampered images.

MMNov 25, 2022
Training Data Improvement for Image Forgery Detection using Comprint

Hannes Mareen, Dante Vanden Bussche, Glenn Van Wallendael et al.

Manipulated images are a threat to consumers worldwide, when they are used to spread disinformation. Therefore, Comprint enables forgery detection by utilizing JPEG-compression fingerprints. This paper evaluates the impact of the training set on Comprint's performance. Most interestingly, we found that including images compressed with low quality factors during training does not have a significant effect on the accuracy, whereas incorporating recompression boosts the robustness. As such, consumers can use Comprint on their smartphones to verify the authenticity of images.

CVJan 21
POTR: Post-Training 3DGS Compression

Bert Ramlot, Martijn Courteaux, Peter Lambert et al.

3D Gaussian Splatting (3DGS) has recently emerged as a promising contender to Neural Radiance Fields (NeRF) in 3D scene reconstruction and real-time novel view synthesis. 3DGS outperforms NeRF in training and inference speed but has substantially higher storage requirements. To remedy this downside, we propose POTR, a post-training 3DGS codec built on two novel techniques. First, POTR introduces a novel pruning approach that uses a modified 3DGS rasterizer to efficiently calculate every splat's individual removal effect simultaneously. This technique results in 2-4x fewer splats than other post-training pruning techniques and as a result also significantly accelerates inference with experiments demonstrating 1.5-2x faster inference than other compressed models. Second, we propose a novel method to recompute lighting coefficients, significantly reducing their entropy without using any form of training. Our fast and highly parallel approach especially increases AC lighting coefficient sparsity, with experiments demonstrating increases from 70% to 97%, with minimal loss in quality. Finally, we extend POTR with a simple fine-tuning scheme to further enhance pruning, inference, and rate-distortion performance. Experiments demonstrate that POTR, even without fine-tuning, consistently outperforms all other post-training compression techniques in both rate-distortion performance and inference speed.

LGNov 27, 2025Code
Generative Anchored Fields: Controlled Data Generation via Emergent Velocity Fields and Transport Algebra

Deressa Wodajo Deressa, Hannes Mareen, Peter Lambert et al.

We present Generative Anchored Fields (GAF), a generative model that learns independent endpoint predictors, $J$ (noise) and $K$ (data), from any point on a linear bridge. Unlike existing approaches that use a single trajectory or score predictor, GAF is trained to recover the bridge endpoints directly via coordinate learning. The velocity field $v=K-J$ emerges from their time-conditioned disagreement. This factorization enables \textit{Transport Algebra}: algebraic operations on multiple $J/K$ heads for compositional control. With class-specific $K_n$ heads, GAF defines directed transport maps between a shared base noise distribution and multiple data domains, allowing controllable interpolation, multi-class composition, and semantic editing. This is achieved either directly on the predicted data coordinates ($K$) using Iterative Endpoint Refinement (IER), a novel sampler that achieves high-quality generation in $5-8$ steps, or on the emergent velocity field ($v$). We achieve strong sample quality (FID 7.51 on ImageNet $256\times256$ and $7.27$ on CelebA-HQ $256\times 256$, without classifier-free guidance) while treating compositional generation as an architectural primitive. Code available at https://github.com/IDLabMedia/GAF.

LGApr 23, 2024
Conformal Predictive Systems Under Covariate Shift

Jef Jonkers, Glenn Van Wallendael, Luc Duchateau et al.

Conformal Predictive Systems (CPS) offer a versatile framework for constructing predictive distributions, allowing for calibrated inference and informative decision-making. However, their applicability has been limited to scenarios adhering to the Independent and Identically Distributed (IID) model assumption. This paper extends CPS to accommodate scenarios characterized by covariate shifts. We therefore propose Weighted CPS (WCPS), akin to Weighted Conformal Prediction (WCP), leveraging likelihood ratios between training and testing covariate distributions. This extension enables the construction of nonparametric predictive distributions capable of handling covariate shifts. We present theoretical underpinnings and conjectures regarding the validity and efficacy of WCPS and demonstrate its utility through empirical evaluations on both synthetic and real-world datasets. Our simulation experiments indicate that WCPS are probabilistically calibrated under covariate shift.

MMFeb 14, 2024
Blind Deep-Learning-Based Image Watermarking Robust Against Geometric Transformations

Hannes Mareen, Lucas Antchougov, Glenn Van Wallendael et al.

Digital watermarking enables protection against copyright infringement of images. Although existing methods embed watermarks imperceptibly and demonstrate robustness against attacks, they typically lack resilience against geometric transformations. Therefore, this paper proposes a new watermarking method that is robust against geometric attacks. The proposed method is based on the existing HiDDeN architecture that uses deep learning for watermark encoding and decoding. We add new noise layers to this architecture, namely for a differentiable JPEG estimation, rotation, rescaling, translation, shearing and mirroring. We demonstrate that our method outperforms the state of the art when it comes to geometric robustness. In conclusion, the proposed method can be used to protect images when viewed on consumers' devices.

LGFeb 7, 2024
Conformal Convolution and Monte Carlo Meta-learners for Predictive Inference of Individual Treatment Effects

Jef Jonkers, Jarne Verhaeghe, Glenn Van Wallendael et al.

Generating probabilistic forecasts of potential outcomes and individual treatment effects (ITE) is essential for risk-aware decision-making in domains such as healthcare, policy, marketing, and finance. We propose two novel methods: the conformal convolution T-learner (CCT) and the conformal Monte Carlo (CMC) meta-learner, that generate full predictive distributions of both potential outcomes and ITEs. Our approaches combine weighted conformal predictive systems with either analytic convolution of potential outcome distributions or Monte Carlo sampling, addressing covariate shift through propensity score weighting. In contrast to other approaches that allow the generation of potential outcome predictive distributions, our approaches are model agnostic, universal, and come with finite-sample guarantees of probabilistic calibration under knowledge of the propensity score. Regarding estimating the ITE distribution, we formally characterize how assumptions about potential outcomes' noise dependency impact distribution validity and establish universal consistency under independence noise assumptions. Experiments on synthetic and semi-synthetic datasets demonstrate that the proposed methods achieve probabilistically calibrated predictive distributions while maintaining narrow prediction intervals and having performant continuous ranked probability scores. Besides probabilistic forecasting performance, we observe significant efficiency gains for the CCT- and CMC meta-learners compared to other conformal approaches that produce prediction intervals for ITE with coverage guarantees.

CVJan 17, 2025
landmarker: a Toolkit for Anatomical Landmark Localization in 2D/3D Images

Jef Jonkers, Luc Duchateau, Glenn Van Wallendael et al.

Anatomical landmark localization in 2D/3D images is a critical task in medical imaging. Although many general-purpose tools exist for landmark localization in classical computer vision tasks, such as pose estimation, they lack the specialized features and modularity necessary for anatomical landmark localization applications in the medical domain. Therefore, we introduce landmarker, a Python package built on PyTorch. The package provides a comprehensive, flexible toolkit for developing and evaluating landmark localization algorithms, supporting a range of methodologies, including static and adaptive heatmap regression. landmarker enhances the accuracy of landmark identification, streamlines research and development processes, and supports various image formats and preprocessing pipelines. Its modular design allows users to customize and extend the toolkit for specific datasets and applications, accelerating innovation in medical imaging. landmarker addresses a critical need for precision and customization in landmark localization tasks not adequately met by existing general-purpose pose estimation tools.

CVMar 18, 2025
Reliable uncertainty quantification for 2D/3D anatomical landmark localization using multi-output conformal prediction

Jef Jonkers, Frank Coopman, Luc Duchateau et al.

Automatic anatomical landmark localization in medical imaging requires not just accurate predictions but reliable uncertainty quantification for effective clinical decision support. Current uncertainty quantification approaches often fall short, particularly when combined with normality assumptions, systematically underestimating total predictive uncertainty. This paper introduces conformal prediction as a framework for reliable uncertainty quantification in anatomical landmark localization, addressing a critical gap in automatic landmark localization. We present two novel approaches guaranteeing finite-sample validity for multi-output prediction: Multi-output Regression-as-Classification Conformal Prediction (M-R2CCP) and its variant Multi-output Regression to Classification Conformal Prediction set to Region (M-R2C2R). Unlike conventional methods that produce axis-aligned hyperrectangular or ellipsoidal regions, our approaches generate flexible, non-convex prediction regions that better capture the underlying uncertainty structure of landmark predictions. Through extensive empirical evaluation across multiple 2D and 3D datasets, we demonstrate that our methods consistently outperform existing multi-output conformal prediction approaches in both validity and efficiency. This work represents a significant advancement in reliable uncertainty estimation for anatomical landmark localization, providing clinicians with trustworthy confidence measures for their diagnoses. While developed for medical imaging, these methods show promise for broader applications in multi-output regression problems.

HCJan 27, 2021
Art and Science Interaction Lab -- A highly flexible and modular interaction science research facility

Niels Van Kets, Bart Moens, Klaas Bombeke et al.

The Art and Science Interaction Lab (ASIL) is a unique, highly flexible and modular interaction science research facility to effectively bring, analyse and test experiences and interactions in mixed virtual/augmented contexts as well as to conduct research on next-gen immersive technologies. It brings together the expertise and creativity of engineers, performers, designers and scientists creating solutions and experiences shaping the lives of people. The lab is equipped with state-of-the-art visual, auditory and user-tracking equipment, fully synchronized and connected to a central backend. This synchronization allows for highly accurate multi-sensor measurements and analysis.

MMJan 12, 2021
Network-Distributed Video Coding

Johan De Praeter, Christopher Hollmann, Rickard Sjoberg et al.

Nowadays, an enormous amount of videos are streamed every day to countless users, all using different devices and networks. These videos must be adapted in order to provide users with the most suitable video representation based on their device properties and current network conditions. However, the two most common techniques for video adaptation, simulcast and transcoding, represent two extremes. The former offers excellent scalability, but requires a large amount of storage, while the latter has a small storage cost, but is not scalable to many users due to the additional computing cost per requested representation. As a third, in-between approach, network-distributed video coding (NDVC) was proposed within the Moving Picture Experts Group (MPEG). The aim of NDVC is to reduce the storage cost compared to simulcast, while retaining a smaller computing cost compared to transcoding. By exploring the proposed techniques for NDVC, we show the workings of this third option for video providers to deliver their contents to their clients.

MMJan 10, 2020
Exploratory Study on User's Dynamic Visual Acuity and Quality Perception of Impaired Images

Jolien De Letter, Anissa All, Lieven De Marez et al.

In this paper we assess the impact of head movement on user's visual acuity and their quality perception of impaired images. There are physical limitations on the amount of visual information a person can perceive and physical limitations regarding the speed at which our body, and as a consequence our head, can explore a scene. In these limitations lie fundamental solutions for the communication of multimedia systems. As such, subjects were asked to evaluate the perceptual quality of static images presented on a TV screen while their head was in a dynamic (moving) state. The idea is potentially applicable to virtual reality applications and therefore, we also measured the image quality perception of each subject on a head mounted display. Experiments show the significant decrease in visual acuity and quality perception when the user's head is not static, and give an indication on how much the quality can be reduced without the user noticing any impairments.