CVApr 14Code
Semantically Stable Image Composition Analysis via Saliency and Gradient Vector Flow FusionArmin Dadras, Robert Sablatnig, Franziska Proksa et al.
The reliable computational assessment of photographic composition requires features that are discriminative of spatial layout yet robust to semantic content. This paper proposes a low-level representation grounded in the assumption that composition can be understood as the flow of visual attention across geometric structure. We introduce VFCNet, which fuses saliency and edge information into a gradient vector flow (GVF) field. The model computes dual-stream GVF representations, integrates them via attention, and extracts multi-scale flow features with a DINOv3 backbone. VFCNet achieves state-of-the-art performance on the PICD benchmark (CDA-1: 0.683, CDA-2: 0.629), improving by 33.1\% and 36.1\% over the previous best method. We also show that a simple classifier on self-supervised DINOv3 features substantially outperforms more sophisticated, composition-specialized models. Code is available at https://github.com/ADadras/VFCNet
CVOct 17, 2025Code
DGME-T: Directional Grid Motion Encoding for Transformer-Based Historical Camera Movement ClassificationTingyu Lin, Armin Dadras, Florian Kleber et al.
Camera movement classification (CMC) models trained on contemporary, high-quality footage often degrade when applied to archival film, where noise, missing frames, and low contrast obscure motion cues. We bridge this gap by assembling a unified benchmark that consolidates two modern corpora into four canonical classes and restructures the HISTORIAN collection into five balanced categories. Building on this benchmark, we introduce DGME-T, a lightweight extension to the Video Swin Transformer that injects directional grid motion encoding, derived from optical flow, via a learnable and normalised late-fusion layer. DGME-T raises the backbone's top-1 accuracy from 81.78% to 86.14% and its macro F1 from 82.08% to 87.81% on modern clips, while still improving the demanding World-War-II footage from 83.43% to 84.62% accuracy and from 81.72% to 82.63% macro F1. A cross-domain study further shows that an intermediate fine-tuning stage on modern data increases historical performance by more than five percentage points. These results demonstrate that structured motion priors and transformer representations are complementary and that even a small, carefully calibrated motion head can substantially enhance robustness in degraded film analysis. Related resources are available at https://github.com/linty5/DGME-T.
CVOct 16, 2025
Camera Movement Classification in Historical Footage: A Comparative Study of Deep Video ModelsTingyu Lin, Armin Dadras, Florian Kleber et al.
Camera movement conveys spatial and narrative information essential for understanding video content. While recent camera movement classification (CMC) methods perform well on modern datasets, their generalization to historical footage remains unexplored. This paper presents the first systematic evaluation of deep video CMC models on archival film material. We summarize representative methods and datasets, highlighting differences in model design and label definitions. Five standard video classification models are assessed on the HISTORIAN dataset, which includes expert-annotated World War II footage. The best-performing model, Video Swin Transformer, achieves 80.25% accuracy, showing strong convergence despite limited training data. Our findings highlight the challenges and potential of adapting existing models to low-quality video and motivate future work combining diverse input modalities and temporal architectures.
IVAug 5, 2022
Distance-based detection of out-of-distribution silent failures for Covid-19 lung lesion segmentationCamila Gonzalez, Karol Gotkowski, Moritz Fuchs et al.
Automatic segmentation of ground glass opacities and consolidations in chest computer tomography (CT) scans can potentially ease the burden of radiologists during times of high resource utilisation. However, deep learning models are not trusted in the clinical routine due to failing silently on out-of-distribution (OOD) data. We propose a lightweight OOD detection method that leverages the Mahalanobis distance in the feature space and seamlessly integrates into state-of-the-art segmentation pipelines. The simple approach can even augment pre-trained models with clinically relevant uncertainty quantification. We validate our method across four chest CT distribution shifts and two magnetic resonance imaging applications, namely segmentation of the hippocampus and the prostate. Our results show that the proposed method effectively detects far- and near-OOD samples across all explored scenarios.
IVDec 16, 2021
Quality monitoring of federated Covid-19 lesion segmentationCamila Gonzalez, Christian Harder, Amin Ranem et al.
Federated Learning is the most promising way to train robust Deep Learning models for the segmentation of Covid-19-related findings in chest CTs. By learning in a decentralized fashion, heterogeneous data can be leveraged from a variety of sources and acquisition protocols whilst ensuring patient privacy. It is, however, crucial to continuously monitor the performance of the model. Yet when it comes to the segmentation of diffuse lung lesions, a quick visual inspection is not enough to assess the quality, and thorough monitoring of all network outputs by expert radiologists is not feasible. In this work, we present an array of lightweight metrics that can be calculated locally in each hospital and then aggregated for central monitoring of a federated system. Our linear model detects over 70% of low-quality segmentations on an out-of-distribution dataset and thus reliably signals a decline in model performance.