Blaž Rolih

CV
h-index31
7papers
35citations
Novelty51%
AI Score56

7 Papers

CVAug 6, 2024Code
SuperSimpleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection

Blaž Rolih, Matic Fučka, Danijel Skočaj

The aim of surface defect detection is to identify and localise abnormal regions on the surfaces of captured objects, a task that's increasingly demanded across various industries. Current approaches frequently fail to fulfil the extensive demands of these industries, which encompass high performance, consistency, and fast operation, along with the capacity to leverage the entirety of the available training data. Addressing these gaps, we introduce SuperSimpleNet, an innovative discriminative model that evolved from SimpleNet. This advanced model significantly enhances its predecessor's training consistency, inference time, as well as detection performance. SuperSimpleNet operates in an unsupervised manner using only normal training images but also benefits from labelled abnormal training images when they are available. SuperSimpleNet achieves state-of-the-art results in both the supervised and the unsupervised settings, as demonstrated by experiments across four challenging benchmark datasets. Code: https://github.com/blaz-r/SuperSimpleNet .

CVMay 14
ChangeFlow -- Latent Rectified Flow for Change Detection in Remote Sensing

Blaž Rolih, Matic Fučka, Filip Wolf et al.

Remote sensing change detection (RSCD) aims to localise changes between two images of the same geographic region. In practice, change masks often follow region-level annotation conventions rather than purely local appearance differences, making them context-dependent and occasionally ambiguous. Most state-of-the-art methods utilise per-pixel discriminative classification, which produces a single prediction per input and fails to explicitly model the changed region as a coherent whole. A natural alternative is generative formulation, which can model a distribution of plausible masks, enabling sampling to capture ambiguity and encourage global consistency. However, existing generative RSCD approaches typically lag behind strong discriminative baselines due to the high computational cost of pixel-space generation and the complexity of their conditioning mechanisms. To address the limitations of prior discriminative and generative methods, we propose ChangeFlow, a generative framework that reformulates change detection as the synthesis of a change mask in latent space via rectified flow. ChangeFlow is guided by a structured yet lightweight conditioning signal, and its stochastic design naturally supports sampling-based prediction ensembling. Namely, aggregating multiple predicted change masks improves robustness, while sample agreement provides a practical confidence estimation that highlights ambiguous regions. Across four benchmarks, ChangeFlow achieves an average F1 of 80.4\%, improving by 1.3 points on average over the previous best method, while maintaining inference speed comparable to recent strong baselines. Project page: https://blaz-r.github.io/changeflow_cd

CVAug 26, 2025Code
No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes

Blaž Rolih, Matic Fučka, Danijel Skočaj

Surface defect detection is a critical task across numerous industries, aimed at efficiently identifying and localising imperfections or irregularities on manufactured components. While numerous methods have been proposed, many fail to meet industrial demands for high performance, efficiency, and adaptability. Existing approaches are often constrained to specific supervision scenarios and struggle to adapt to the diverse data annotations encountered in real-world manufacturing processes, such as unsupervised, weakly supervised, mixed supervision, and fully supervised settings. To address these challenges, we propose SuperSimpleNet, a highly efficient and adaptable discriminative model built on the foundation of SimpleNet. SuperSimpleNet incorporates a novel synthetic anomaly generation process, an enhanced classification head, and an improved learning procedure, enabling efficient training in all four supervision scenarios, making it the first model capable of fully leveraging all available data annotations. SuperSimpleNet sets a new standard for performance across all scenarios, as demonstrated by its results on four challenging benchmark datasets. Beyond accuracy, it is very fast, achieving an inference time below 10 ms. With its ability to unify diverse supervision paradigms while maintaining outstanding speed and reliability, SuperSimpleNet represents a promising step forward in addressing real-world manufacturing challenges and bridging the gap between academic research and industrial applications. Code: https://github.com/blaz-r/SuperSimpleNet

CVJul 4, 2025Code
Be the Change You Want to See: Revisiting Remote Sensing Change Detection Practices

Blaž Rolih, Matic Fučka, Filip Wolf et al.

Remote sensing change detection aims to localize semantic changes between images of the same location captured at different times. In the past few years, newer methods have attributed enhanced performance to the additions of new and complex components to existing architectures. Most fail to measure the performance contribution of fundamental design choices such as backbone selection, pre-training strategies, and training configurations. We claim that such fundamental design choices often improve performance even more significantly than the addition of new architectural components. Due to that, we systematically revisit the design space of change detection models and analyse the full potential of a well-optimised baseline. We identify a set of fundamental design choices that benefit both new and existing architectures. Leveraging this insight, we demonstrate that when carefully designed, even an architecturally simple model can match or surpass state-of-the-art performance on six challenging change detection datasets. Our best practices generalise beyond our architecture and also offer performance improvements when applied to related methods, indicating that the space of fundamental design choices has been underexplored. Our guidelines and architecture provide a strong foundation for future methods, emphasizing that optimizing core components is just as important as architectural novelty in advancing change detection performance. Code: https://github.com/blaz-r/BTC-change-detection

CVFeb 23
Make Some Noise: Unsupervised Remote Sensing Change Detection Using Latent Space Perturbations

Blaž Rolih, Matic Fučka, Filip Wolf et al.

Unsupervised change detection (UCD) in remote sensing aims to localise semantic changes between two images of the same region without relying on labelled data during training. Most recent approaches rely either on frozen foundation models in a training-free manner or on training with synthetic changes generated in pixel space. Both strategies inherently rely on predefined assumptions about change types, typically introduced through handcrafted rules, external datasets, or auxiliary generative models. Due to these assumptions, such methods fail to generalise beyond a few change types, limiting their real-world usage, especially in rare or complex scenarios. To address this, we propose MaSoN (Make Some Noise), an end-to-end UCD framework that synthesises diverse changes directly in the latent feature space during training. It generates changes that are dynamically estimated using feature statistics of target data, enabling diverse yet data-driven variation aligned with the target domain. It also easily extends to new modalities, such as SAR. MaSoN generalises strongly across diverse change types and achieves state-of-the-art performance on five benchmarks, improving the average F1 score by 14.1 percentage points. Project page: https://blaz-r.github.io/mason_ucd

CVFeb 23
Brewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation

Filip Wolf, Blaž Rolih, Luka Čehovin Zajc

Foundation models are transforming Earth Observation (EO), yet the diversity of EO sensors and modalities makes a single universal model unrealistic. Multiple specialized EO foundation models (EOFMs) will likely coexist, making efficient knowledge transfer across modalities essential. Most existing EO pretraining relies on masked image modeling, which emphasizes local reconstruction but provides limited control over global semantic structure. To address this, we propose a dual-teacher contrastive distillation framework for multispectral imagery that aligns the student's pretraining objective with the contrastive self-distillation paradigm of modern optical vision foundation models (VFMs). Our approach combines a multispectral teacher with an optical VFM teacher, enabling coherent cross-modal representation learning. Experiments across diverse optical and multispectral benchmarks show that our model adapts to multispectral data without compromising performance on optical-only inputs, achieving state-of-the-art results in both settings, with an average improvement of 3.64 percentage points in semantic segmentation, 1.2 in change detection, and 1.31 in classification tasks. This demonstrates that contrastive distillation provides a principled and efficient approach to scalable representation learning across heterogeneous EO data sources. Project page: \textcolor{magenta}{https://wolfilip.github.io/DEO/}.

CVMar 7, 2024
Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled Ensemble

Blaž Rolih, Dick Ameln, Ashwin Vaidya et al.

Industrial anomaly detection is an important task within computer vision with a wide range of practical use cases. The small size of anomalous regions in many real-world datasets necessitates processing the images at a high resolution. This frequently poses significant challenges concerning memory consumption during the model training and inference stages, leaving some existing methods impractical for widespread adoption. To overcome this challenge, we present the tiled ensemble approach, which reduces memory consumption by dividing the input images into a grid of tiles and training a dedicated model for each tile location. The tiled ensemble is compatible with any existing anomaly detection model without the need for any modification of the underlying architecture. By introducing overlapping tiles, we utilize the benefits of traditional stacking ensembles, leading to further improvements in anomaly detection capabilities beyond high resolution alone. We perform a comprehensive analysis using diverse underlying architectures, including Padim, PatchCore, FastFlow, and Reverse Distillation, on two standard anomaly detection datasets: MVTec and VisA. Our method demonstrates a notable improvement across setups while remaining within GPU memory constraints, consuming only as much GPU memory as a single model needs to process a single tile.