Stefan Harmeling

LG
h-index26
37papers
1,217citations
Novelty46%
AI Score46

37 Papers

CVMay 30, 2022Code
Deblurring Photographs of Characters Using Deep Neural Networks

Thomas Germer, Tobias Uelwer, Stefan Harmeling

In this paper, we present our approach for the Helsinki Deblur Challenge (HDC2021). The task of this challenge is to deblur images of characters without knowing the point spread function (PSF). The organizers provided a dataset of pairs of sharp and blurred images. Our method consists of three steps: First, we estimate a warping transformation of the images to align the sharp images with the blurred ones. Next, we estimate the PSF using a quasi-Newton method. The estimated PSF allows to generate additional pairs of sharp and blurred images. Finally, we train a deep convolutional neural network to reconstruct the sharp images from the blurred images. Our method is able to successfully reconstruct images from the first 10 stages of the HDC 2021 data. Our code is available at https://github.com/hhu-machine-learning/hdc2021-psfnn.

LGMar 13, 2023
Transformer-based World Models Are Happy With 100k Interactions

Jan Robine, Marc Höftmann, Tobias Uelwer et al.

Deep neural networks have been successful in many reinforcement learning settings. However, compared to human learners they are overly data hungry. To build a sample-efficient world model, we apply a transformer to real-world episodes in an autoregressive manner: not only the compact latent states and the taken actions but also the experienced or predicted rewards are fed into the transformer, so that it can attend flexibly to all three modalities at different time steps. The transformer allows our world model to access previous states directly, instead of viewing them through a compressed recurrent state. By utilizing the Transformer-XL architecture, it is able to learn long-term dependencies while staying computationally efficient. Our transformer-based world model (TWM) generates meaningful, new experience, which is used to train a policy that outperforms previous model-free and model-based reinforcement learning algorithms on the Atari 100k benchmark.

LGAug 22, 2023
A Survey on Self-Supervised Representation Learning

Tobias Uelwer, Jan Robine, Stefan Sylvius Wagner et al.

Learning meaningful representations is at the heart of many tasks in the field of modern machine learning. Recently, a lot of methods were introduced that allow learning of image representations without supervision. These representations can then be used in downstream tasks like classification or object detection. The quality of these representations is close to supervised learning, while no labeled images are needed. This survey paper provides a comprehensive review of these methods in a unified notation, points out similarities and differences of these methods, and proposes a taxonomy which sets these methods in relation to each other. Furthermore, our survey summarizes the most-recent experimental results reported in the literature in form of a meta-study. Our survey is intended as a starting point for researchers and practitioners who want to dive into the field of representation learning.

IVSep 13, 2023
Limited-Angle Tomography Reconstruction via Deep End-To-End Learning on Synthetic Data

Thomas Germer, Jan Robine, Sebastian Konietzny et al.

Computed tomography (CT) has become an essential part of modern science and medicine. A CT scanner consists of an X-ray source that is spun around an object of interest. On the opposite end of the X-ray source, a detector captures X-rays that are not absorbed by the object. The reconstruction of an image is a linear inverse problem, which is usually solved by filtered back projection. However, when the number of measurements is small, the reconstruction problem is ill-posed. This is for example the case when the X-ray source is not spun completely around the object, but rather irradiates the object only from a limited angle. To tackle this problem, we present a deep neural network that is trained on a large amount of carefully-crafted synthetic data and can perform limited-angle tomography reconstruction even for only 30° or 40° sinograms. With our approach we won the first place in the Helsinki Tomography Challenge 2022.

MLOct 26, 2022
Learning Causal Graphs in Manufacturing Domains using Structural Equation Models

Maximilian Kertel, Stefan Harmeling, Markus Pauly

Many production processes are characterized by numerous and complex cause-and-effect relationships. Since they are only partially known they pose a challenge to effective process control. In this work we present how Structural Equation Models can be used for deriving cause-and-effect relationships from the combination of prior knowledge and process data in the manufacturing domain. Compared to existing applications, we do not assume linear relationships leading to more informative results.

CLSep 12, 2024Code
Supporting Online Discussions: Integrating AI Into the adhocracy+ Participation Platform To Enhance Deliberation

Maike Behrendt, Stefan Sylvius Wagner, Mira Warne et al.

Online spaces provide individuals with the opportunity to engage in discussions on important topics and make collective decisions, regardless of their geographic location or time zone. However, without adequate support and careful design, such discussions often suffer from a lack of structure and civility in the exchange of opinions. Artificial intelligence (AI) offers a promising avenue for helping both participants and organizers in managing large-scale online participation processes. This paper introduces an extension of adhocracy+, a large-scale open-source participation platform. Our extension features two AI-supported debate modules designed to improve discussion quality and foster participant interaction. In a large-scale user study we examined the effects and usability of both modules. We report our findings in this paper. The extended platform is available at https://github.com/mabehrendt/discuss2.0.

LGAug 30, 2023
Cyclophobic Reinforcement Learning

Stefan Sylvius Wagner, Peter Arndt, Jan Robine et al.

In environments with sparse rewards, finding a good inductive bias for exploration is crucial to the agent's success. However, there are two competing goals: novelty search and systematic exploration. While existing approaches such as curiosity-driven exploration find novelty, they sometimes do not systematically explore the whole state space, akin to depth-first-search vs breadth-first-search. In this paper, we propose a new intrinsic reward that is cyclophobic, i.e., it does not reward novelty, but punishes redundancy by avoiding cycles. Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations based on the agent's cropped observations we are able to achieve excellent results in the MiniGrid and MiniHack environments. Both are particularly hard, as they require complex interactions with different objects in order to be solved. Detailed comparisons with previous approaches and thorough ablation studies show that our newly proposed cyclophobic reinforcement learning is more sample efficient than other state of the art methods in a variety of tasks.

LGJan 13, 2023
Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

Marc Höftmann, Jan Robine, Stefan Harmeling

Very large state spaces with a sparse reward signal are difficult to explore. The lack of a sophisticated guidance results in a poor performance for numerous reinforcement learning algorithms. In these cases, the commonly used random exploration is often not helpful. The literature shows that this kind of environments require enormous efforts to systematically explore large chunks of the state space. Learned state representations can help here to improve the search by providing semantic context and build a structure on top of the raw observations. In this work we introduce a novel time-myopic state representation that clusters temporal close states together while providing a time prediction capability between them. By adapting this model to the Go-Explore paradigm (Ecoffet et al., 2021b), we demonstrate the first learned state representation that reliably estimates novelty instead of using the hand-crafted representation heuristic. Our method shows an improved solution for the detachment problem which still remains an issue at the Go-Explore Exploration Phase. We provide evidence that our proposed method covers the entire state space with respect to all possible time trajectories without causing disadvantageous conflict-overlaps in the cell archive. Analogous to native Go-Explore, our approach is evaluated on the hard exploration environments MontezumaRevenge, Gravitar and Frostbite (Atari) in order to validate its capabilities on difficult tasks. Our experiments show that time-myopic Go-Explore is an effective alternative for the domain-engineered heuristic while also being more general. The source code of the method is available on GitHub.

LGMay 31, 2022
Optimizing Intermediate Representations of Generative Models for Phase Retrieval

Tobias Uelwer, Sebastian Konietzny, Stefan Harmeling

Phase retrieval is the problem of reconstructing images from magnitude-only measurements. In many real-world applications the problem is underdetermined. When training data is available, generative models allow optimization in a lower-dimensional latent space, hereby constraining the solution set to those images that can be synthesized by the generative model. However, not all possible solutions are within the range of the generator. Instead, they are represented with some error. To reduce this representation error in the context of phase retrieval, we first leverage a novel variation of intermediate layer optimization (ILO) to extend the range of the generator while still producing images consistent with the training data. Second, we introduce new initialization schemes that further improve the quality of the reconstruction. With extensive experiments on the Fourier phase retrieval problem and thorough ablation studies, we can show the benefits of our modified ILO and the new initialization schemes. Additionally, we analyze the performance of our approach on the Gaussian phase retrieval problem.

CVOct 1, 2022
Blindly Deconvolving Super-noisy Blurry Image Sequences

Leonid Kostrykin, Stefan Harmeling

Image blur and image noise are imaging artifacts intrinsically arising in image acquisition. In this paper, we consider multi-frame blind deconvolution (MFBD), where image blur is described by the convolution of an unobservable, undeteriorated image and an unknown filter, and the objective is to recover the undeteriorated image from a sequence of its blurry and noisy observations. We present two new methods for MFBD, which, in contrast to previous work, do not require the estimation of the unknown filters. The first method is based on likelihood maximization and requires careful initialization to cope with the non-convexity of the loss function. The second method circumvents this requirement and exploits that the solution of likelihood maximization emerges as an eigenvector of a specifically constructed matrix, if the signal subspace spanned by the observations has a sufficiently large dimension. We describe a pre-processing step, which increases the dimension of the signal subspace by artificially generating additional observations. We also propose an extension of the eigenvector method, which copes with insufficient dimensions of the signal subspace by estimating a footprint of the unknown filters (that is a vector of the size of the filters, only one is required for the whole image sequence). We have applied the eigenvector method to synthetically generated image sequences and performed a quantitative comparison with a previous method, obtaining strongly improved results.

IVOct 26, 2021Code
A Closer Look at Reference Learning for Fourier Phase Retrieval

Tobias Uelwer, Nick Rucks, Stefan Harmeling

Reconstructing images from their Fourier magnitude measurements is a problem that often arises in different research areas. This process is also referred to as phase retrieval. In this work, we consider a modified version of the phase retrieval problem, which allows for a reference image to be added onto the image before the Fourier magnitudes are measured. We analyze an unrolled Gerchberg-Saxton (GS) algorithm that can be used to learn a good reference image from a dataset. Furthermore, we take a closer look at the learned reference images and propose a simple and efficient heuristic to construct reference images that, in some cases, yields reconstructions of comparable quality as approaches that learn references. Our code is available at https://github.com/tuelwer/reference-learning.

CVMar 25, 2020Code
PyMatting: A Python Library for Alpha Matting

Thomas Germer, Tobias Uelwer, Stefan Conrad et al.

An important step of many image editing tasks is to extract specific objects from an image in order to place them in a scene of a movie or compose them onto another background. Alpha matting describes the problem of separating the objects in the foreground from the background of an image given only a rough sketch. We introduce the PyMatting package for Python which implements various approaches to solve the alpha matting problem. Our toolbox is also able to extract the foreground of an image given the alpha matte. The implementation aims to be computationally efficient and easy to use. The source code of PyMatting is available under an open-source license at https://github.com/pymatting/pymatting.

CLApr 11, 2024
SQBC: Active Learning using LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions

Stefan Sylvius Wagner, Maike Behrendt, Marc Ziegele et al.

Stance detection is an important task for many applications that analyse or support online political discussions. Common approaches include fine-tuning transformer based models. However, these models require a large amount of labelled data, which might not be available. In this work, we present two different ways to leverage LLM-generated synthetic data to train and improve stance detection agents for online political discussions: first, we show that augmenting a small fine-tuning dataset with synthetic data can improve the performance of the stance detection model. Second, we propose a new active learning method called SQBC based on the "Query-by-Comittee" approach. The key idea is to use LLM-generated synthetic data as an oracle to identify the most informative unlabelled samples, that are selected for manual labelling. Comprehensive experiments show that both ideas can improve the stance detection performance. Curiously, we observed that fine-tuning on actively selected samples can exceed the performance of using the full dataset.

CLApr 3, 2024
AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs

Maike Behrendt, Stefan Sylvius Wagner, Marc Ziegele et al.

Measuring the quality of contributions in political online discussions is crucial in deliberation research and computer science. Research has identified various indicators to assess online discussion quality, and with deep learning advancements, automating these measures has become feasible. While some studies focus on analyzing specific quality indicators, a comprehensive quality score incorporating various deliberative aspects is often preferred. In this work, we introduce AQuA, an additive score that calculates a unified deliberative quality score from multiple indices for each discussion post. Unlike other singular scores, AQuA preserves information on the deliberative aspects present in comments, enhancing model transparency. We develop adapter models for 20 deliberative indices, and calculate correlation coefficients between experts' annotations and the perceived deliberativeness by non-experts to weigh the individual indices into a single deliberative score. We demonstrate that the AQuA score can be computed easily from pre-trained adapters and aligns well with annotations on other datasets that have not be seen during training. The analysis of experts' vs. non-experts' annotations confirms theoretical findings in the social science literature.

LGJun 3, 2025
Simple, Good, Fast: Self-Supervised World Models Free of Baggage

Jan Robine, Marc Höftmann, Stefan Harmeling

What are the essential components of world models? How far do we get with world models that are not employing RNNs, transformers, discrete representations, and image reconstructions? This paper introduces SGF, a Simple, Good, and Fast world model that uses self-supervised representation learning, captures short-time dependencies through frame and action stacking, and enhances robustness against model errors through data augmentation. We extensively discuss SGF's connections to established world models, evaluate the building blocks in ablation studies, and demonstrate good performance through quantitative comparisons on the Atari 100k benchmark.

LGFeb 5, 2024
Just Cluster It: An Approach for Exploration in High-Dimensions using Clustering and Pre-Trained Representations

Stefan Sylvius Wagner, Stefan Harmeling

In this paper we adopt a representation-centric perspective on exploration in reinforcement learning, viewing exploration fundamentally as a density estimation problem. We investigate the effectiveness of clustering representations for exploration in 3-D environments, based on the observation that the importance of pixel changes between transitions is less pronounced in 3-D environments compared to 2-D environments, where pixel changes between transitions are typically distinct and significant. We propose a method that performs episodic and global clustering on random representations and on pre-trained DINO representations to count states, i.e, estimate pseudo-counts. Surprisingly, even random features can be clustered effectively to count states in 3-D environments, however when these become visually more complex, pre-trained DINO representations are more effective thanks to the pre-trained inductive biases in the representations. Overall, this presents a pathway for integrating pre-trained biases into exploration. We evaluate our approach on the VizDoom and Habitat environments, demonstrating that our method surpasses other well-known exploration methods in these settings.

CVMar 12, 2025
Oh-A-DINO: Understanding and Enhancing Attribute-Level Information in Self-Supervised Object-Centric Representations

Stefan Sylvius Wagner, Stefan Harmeling

Object-centric understanding is fundamental to human vision and required for complex reasoning. Traditional methods define slot-based bottlenecks to learn object properties explicitly, while recent self-supervised vision models like DINO have shown emergent object understanding. We investigate the effectiveness of self-supervised representations from models such as CLIP, DINOv2 and DINOv3, as well as slot-based approaches, for multi-object instance retrieval, where specific objects must be faithfully identified in a scene. This scenario is increasingly relevant as pre-trained representations are deployed in downstream tasks, e.g., retrieval, manipulation, and goal-conditioned policies that demand fine-grained object understanding. Our findings reveal that self-supervised vision models and slot-based representations excel at identifying edge-derived geometry (shape, size) but fail to preserve non-geometric surface-level cues (colour, material, texture), which are critical for disambiguating objects when reasoning about or selecting them in such tasks. We show that learning an auxiliary latent space over segmented patches, where VAE regularisation enforces compact, disentangled object-centric representations, recovers these missing attributes. Augmenting the self-supervised methods with such latents improves retrieval across all attributes, suggesting a promising direction for making self-supervised representations more reliable in downstream tasks that require precise object-level reasoning.

CLJun 3, 2025
Natural Language Processing to Enhance Deliberation in Political Online Discussions: A Survey

Maike Behrendt, Stefan Sylvius Wagner, Carina Weinmann et al.

Political online participation in the form of discussing political issues and exchanging opinions among citizens is gaining importance with more and more formats being held digitally. To come to a decision, a careful discussion and consideration of opinions and a civil exchange of arguments, which is defined as the act of deliberation, is desirable. The quality of discussions and participation processes in terms of their deliberativeness highly depends on the design of platforms and processes. To facilitate online communication for both participants and initiators, machine learning methods offer a lot of potential. In this work we want to showcase which issues occur in political online discussions and how machine learning can be used to counteract these issues and enhance deliberation.

CLMay 21, 2025
MaxPoolBERT: Enhancing BERT Classification via Layer- and Token-Wise Aggregation

Maike Behrendt, Stefan Sylvius Wagner, Stefan Harmeling

The [CLS] token in BERT is commonly used as a fixed-length representation for classification tasks, yet prior work has shown that both other tokens and intermediate layers encode valuable contextual information. In this work, we study lightweight extensions to BERT that refine the [CLS] representation by aggregating information across layers and tokens. Specifically, we explore three modifications: (i) max-pooling the [CLS] token across multiple layers, (ii) enabling the [CLS] token to attend over the entire final layer using an additional multi-head attention (MHA) layer, and (iii) combining max-pooling across the full sequence with MHA. Our approach, called MaxPoolBERT, enhances BERT's classification accuracy (especially on low-resource tasks) without requiring new pre-training or significantly increasing model size. Experiments on the GLUE benchmark show that MaxPoolBERT consistently achieves a better performance than the standard BERT base model on low resource tasks of the GLUE benchmark.

LGDec 8, 2023
Backward Learning for Goal-Conditioned Policies

Marc Höftmann, Jan Robine, Stefan Harmeling

Can we learn policies in reinforcement learning without rewards? Can we learn a policy just by trying to reach a goal state? We answer these questions positively by proposing a multi-step procedure that first learns a world model that goes backward in time, secondly generates goal-reaching backward trajectories, thirdly improves those sequences using shortest path finding algorithms, and finally trains a neural network policy by imitation learning. We evaluate our method on a deterministic maze environment where the observations are $64\times 64$ pixel bird's eye images and can show that it consistently reaches several goals.

LGOct 17, 2025
DFCA: Decentralized Federated Clustering Algorithm

Jonas Kirch, Sebastian Becker, Tiago Koketsu Rodrigues et al.

Clustered Federated Learning has emerged as an effective approach for handling heterogeneous data across clients by partitioning them into clusters with similar or identical data distributions. However, most existing methods, including the Iterative Federated Clustering Algorithm (IFCA), rely on a central server to coordinate model updates, which creates a bottleneck and a single point of failure, limiting their applicability in more realistic decentralized learning settings. In this work, we introduce DFCA, a fully decentralized clustered FL algorithm that enables clients to collaboratively train cluster-specific models without central coordination. DFCA uses a sequential running average to aggregate models from neighbors as updates arrive, providing a communication-efficient alternative to batch aggregation while maintaining clustering performance. Our experiments on various datasets demonstrate that DFCA outperforms other decentralized algorithms and performs comparably to centralized IFCA, even under sparse connectivity, highlighting its robustness and practicality for dynamic real-world decentralized networks.

CLJun 18, 2024
The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions

Stefan Sylvius Wagner, Maike Behrendt, Marc Ziegele et al.

Stance detection holds great potential to improve online political discussions through its deployment in discussion platforms for purposes such as content moderation, topic summarization or to facilitate more balanced discussions. Typically, transformer-based models are employed directly for stance detection, requiring vast amounts of data. However, the wide variety of debate topics in online political discussions makes data collection particularly challenging. LLMs have revived stance detection, but their online deployment in online political discussions faces challenges like inconsistent outputs, biases, and vulnerability to adversarial attacks. We show how LLM-generated synthetic data can improve stance detection for online political discussions by using reliable traditional stance detection models for online deployment, while leveraging the text generation capabilities of LLMs for synthetic data generation in a secure offline environment. To achieve this, (i) we generate synthetic data for specific debate questions by prompting a Mistral-7B model and show that fine-tuning with the generated synthetic data can substantially improve the performance of stance detection, while remaining interpretable and aligned with real world data. (ii) Using the synthetic data as a reference, we can improve performance even further by identifying the most informative samples in an unlabelled dataset, i.e., those samples which the stance detection model is most uncertain about and can benefit from the most. By fine-tuning with both synthetic data and the most informative samples, we surpass the performance of the baseline model that is fine-tuned on all true labels, while labelling considerably less data.

IVJun 18, 2021
Non-Iterative Phase Retrieval With Cascaded Neural Networks

Tobias Uelwer, Tobias Hoffmann, Stefan Harmeling

Fourier phase retrieval is the problem of reconstructing a signal given only the magnitude of its Fourier transformation. Optimization-based approaches, like the well-established Gerchberg-Saxton or the hybrid input output algorithm, struggle at reconstructing images from magnitudes that are not oversampled. This motivates the application of learned methods, which allow reconstruction from non-oversampled magnitude measurements after a learning phase. In this paper, we want to push the limits of these learned methods by means of a deep neural network cascade that reconstructs the image successively on different resolutions from its non-oversampled Fourier magnitude. We evaluate our method on four different datasets (MNIST, EMNIST, Fashion-MNIST, and KMNIST) and demonstrate that it yields improved performance over other non-iterative methods and optimization-based methods.

LGJun 18, 2021
Learning to Plan via a Multi-Step Policy Regression Method

Stefan Wagner, Michael Janschek, Tobias Uelwer et al.

We propose a new approach to increase inference performance in environments that require a specific sequence of actions in order to be solved. This is for example the case for maze environments where ideally an optimal path is determined. Instead of learning a policy for a single step, we want to learn a policy that can predict n actions in advance. Our proposed method called policy horizon regression (PHR) uses knowledge of the environment sampled by A2C to learn an n dimensional policy vector in a policy distillation setup which yields n sequential actions per observation. We test our method on the MiniGrid and Pong environments and show drastic speedup during inference time by successfully predicting sequences of actions on a single observation.

CVApr 7, 2021
Contour Proposal Networks for Biomedical Instance Segmentation

Eric Upschulte, Stefan Harmeling, Katrin Amunts et al.

We present a conceptually simple framework for object instance segmentation called Contour Proposal Network (CPN), which detects possibly overlapping objects in an image while simultaneously fitting closed object contours using an interpretable, fixed-sized representation based on Fourier Descriptors. The CPN can incorporate state of the art object detection architectures as backbone networks into a single-stage instance segmentation model that can be trained end-to-end. We construct CPN models with different backbone networks, and apply them to instance segmentation of cells in datasets from different modalities. In our experiments, we show CPNs that outperform U-Nets and Mask R-CNNs in instance segmentation accuracy, and present variants with execution times suitable for real-time applications. The trained models generalize well across different domains of cell types. Since the main assumption of the framework are closed object contours, it is applicable to a wide range of detection problems also outside the biomedical domain. An implementation of the model architecture in PyTorch is freely available.

IVMar 9, 2021
2D histology meets 3D topology: Cytoarchitectonic brain mapping with Graph Neural Networks

Christian Schiffer, Stefan Harmeling, Katrin Amunts et al.

Cytoarchitecture describes the spatial organization of neuronal cells in the brain, including their arrangement into layers and columns with respect to cell density, orientation, or presence of certain cell types. It allows to segregate the brain into cortical areas and subcortical nuclei, links structure with connectivity and function, and provides a microstructural reference for human brain atlases. Mapping boundaries between areas requires to scan histological sections at microscopic resolution. While recent high-throughput scanners allow to scan a complete human brain in the order of a year, it is practically impossible to delineate regions at the same pace using the established gold standard method. Researchers have recently addressed cytoarchitectonic mapping of cortical regions with deep neural networks, relying on image patches from individual 2D sections for classification. However, the 3D context, which is needed to disambiguate complex or obliquely cut brain regions, is not taken into account. In this work, we combine 2D histology with 3D topology by reformulating the mapping task as a node classification problem on an approximate 3D midsurface mesh through the isocortex. We extract deep features from cortical patches in 2D histological sections which are descriptive of cytoarchitecture, and assign them to the corresponding nodes on the 3D mesh to construct a large attributed graph. By solving the brain mapping problem on this graph using graph neural networks, we obtain significantly improved classification results. The proposed framework lends itself nicely to integration of additional neuroanatomical priors for mapping.

IVNov 25, 2020
Contrastive Representation Learning for Whole Brain Cytoarchitectonic Mapping in Histological Human Brain Sections

Christian Schiffer, Katrin Amunts, Stefan Harmeling et al.

Cytoarchitectonic maps provide microstructural reference parcellations of the brain, describing its organization in terms of the spatial arrangement of neuronal cell bodies as measured from histological tissue sections. Recent work provided the first automatic segmentations of cytoarchitectonic areas in the visual system using Convolutional Neural Networks. We aim to extend this approach to become applicable to a wider range of brain areas, envisioning a solution for mapping the complete human brain. Inspired by recent success in image classification, we propose a contrastive learning objective for encoding microscopic image patches into robust microstructural features, which are efficient for cytoarchitectonic area classification. We show that a model pre-trained using this learning task outperforms a model trained from scratch, as well as a model pre-trained on a recently proposed auxiliary task. We perform cluster analysis in the feature space to show that the learned representations form anatomically meaningful groups.

IVNov 25, 2020
Convolutional Neural Networks for cytoarchitectonic brain mapping at large scale

Christian Schiffer, Hannah Spitzer, Kai Kiwitz et al.

Human brain atlases provide spatial reference systems for data characterizing brain organization at different levels, coming from different brains. Cytoarchitecture is a basic principle of the microstructural organization of the brain, as regional differences in the arrangement and composition of neuronal cells are indicators of changes in connectivity and function. Automated scanning procedures and observer-independent methods are prerequisites to reliably identify cytoarchitectonic areas, and to achieve reproducible models of brain segregation. Time becomes a key factor when moving from the analysis of single regions of interest towards high-throughput scanning of large series of whole-brain sections. Here we present a new workflow for mapping cytoarchitectonic areas in large series of cell-body stained histological sections of human postmortem brains. It is based on a Deep Convolutional Neural Network (CNN), which is trained on a pair of section images with annotations, with a large number of un-annotated sections in between. The model learns to create all missing annotations in between with high accuracy, and faster than our previous workflow based on observer-independent mapping. The new workflow does not require preceding 3D-reconstruction of sections, and is robust against histological artefacts. It processes large data sets with sizes in the order of multiple Terabytes efficiently. The workflow was integrated into a web interface, to allow access without expertise in deep learning and batch computing. Applying deep neural networks for cytoarchitectonic mapping opens new perspectives to enable high-resolution models of brain areas, introducing CNNs to identify borders of brain areas.

LGOct 12, 2020
Smaller World Models for Reinforcement Learning

Jan Robine, Tobias Uelwer, Stefan Harmeling

Sample efficiency remains a fundamental issue of reinforcement learning. Model-based algorithms try to make better use of data by simulating the environment with a model. We propose a new neural network architecture for world models based on a vector quantized-variational autoencoder (VQ-VAE) to encode observations and a convolutional LSTM to predict the next embedding indices. A model-free PPO agent is trained purely on simulated experience from the world model. We adopt the setup introduced by Kaiser et al. (2020), which only allows 100K interactions with the real environment. We apply our method on 36 Atari environments and show that we reach comparable performance to their SimPLe algorithm, while our model is significantly smaller.

CVJun 26, 2020
Fast Multi-Level Foreground Estimation

Thomas Germer, Tobias Uelwer, Stefan Conrad et al.

Alpha matting aims to estimate the translucency of an object in a given image. The resulting alpha matte describes pixel-wise to what amount foreground and background colors contribute to the color of the composite image. While most methods in literature focus on estimating the alpha matte, the process of estimating the foreground colors given the input image and its alpha matte is often neglected, although foreground estimation is an essential part of many image editing workflows. In this work, we propose a novel method for foreground estimation given the alpha matte. We demonstrate that our fast multi-level approach yields results that are comparable with the state-of-the-art while outperforming those methods in computational runtime and memory usage.

IVDec 10, 2019
Phase Retrieval Using Conditional Generative Adversarial Networks

Tobias Uelwer, Alexander Oberstraß, Stefan Harmeling

In this paper, we propose the application of conditional generative adversarial networks to solve various phase retrieval problems. We show that including knowledge of the measurement process at training time leads to an optimization at test time that is more robust to initialization than existing approaches involving generative models. In addition, conditioning the generator network on the measurements enables us to achieve much more detailed results. We empirically demonstrate that these advantages provide meaningful solutions to the Fourier and the compressive phase retrieval problem and that our method outperforms well-established projection-based methods as well as existing methods that are based on neural networks. Like other deep learning methods, our approach is very robust to noise and can therefore be very useful for real-world applications.

LGJun 9, 2019
On the Vulnerability of Capsule Networks to Adversarial Attacks

Felix Michels, Tobias Uelwer, Eric Upschulte et al.

This paper extensively evaluates the vulnerability of capsule networks to different adversarial attacks. Recent work suggests that these architectures are more robust towards adversarial attacks than other neural networks. However, our experiments show that capsule networks can be fooled as easily as convolutional neural networks.

LGFeb 5, 2019
Modular Block-diagonal Curvature Approximations for Feedforward Architectures

Felix Dangel, Stefan Harmeling, Philipp Hennig

We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.

CVJun 13, 2018
Improving Cytoarchitectonic Segmentation of Human Brain Areas with Self-supervised Siamese Networks

Hannah Spitzer, Kai Kiwitz, Katrin Amunts et al.

Cytoarchitectonic parcellations of the human brain serve as anatomical references in multimodal atlas frameworks. They are based on analysis of cell-body stained histological sections and the identification of borders between brain areas. The de-facto standard involves a semi-automatic, reproducible border detection, but does not scale with high-throughput imaging in large series of sections at microscopical resolution. Automatic parcellation, however, is extremely challenging due to high variation in the data, and the need for a large field of view at microscopic resolution. The performance of a recently proposed Convolutional Neural Network model that addresses this problem especially suffers from the naturally limited amount of expert annotations for training. To circumvent this limitation, we propose to pre-train neural networks on a self-supervised auxiliary task, predicting the 3D distance between two patches sampled from the same brain. Compared to a random initialization, fine-tuning from these networks results in significantly better segmentations. We show that the self-supervised model has implicitly learned to distinguish several cortical brain areas -- a strong indicator that the proposed auxiliary task is appropriate for cytoarchitectonic mapping.

CVMay 30, 2017
Parcellation of Visual Cortex on high-resolution histological Brain Sections using Convolutional Neural Networks

Hannah Spitzer, Katrin Amunts, Stefan Harmeling et al.

Microscopic analysis of histological sections is considered the "gold standard" to verify structural parcellations in the human brain. Its high resolution allows the study of laminar and columnar patterns of cell distributions, which build an important basis for the simulation of cortical areas and networks. However, such cytoarchitectonic mapping is a semiautomatic, time consuming process that does not scale with high throughput imaging. We present an automatic approach for parcellating histological sections at 2um resolution. It is based on a convolutional neural network that combines topological information from probabilistic atlases with the texture features learned from high-resolution cell-body stained images. The model is applied to visual areas and trained on a sparse set of partial annotations. We show how predictions are transferable to new brains and spatially consistent across sections.

CVJun 28, 2014
Learning to Deblur

Christian J. Schuler, Michael Hirsch, Stefan Harmeling et al.

We describe a learning-based approach to blind image deconvolution. It uses a deep layered architecture, parts of which are borrowed from recent work on neural network learning, and parts of which incorporate computations that are specific to image deconvolution. The system is trained end-to-end on a set of artificially generated training examples, enabling competitive performance in blind deconvolution, both with respect to quality and runtime.

OPTICSMar 1, 2013
On a link between kernel mean maps and Fraunhofer diffraction, with an application to super-resolution beyond the diffraction limit

Stefan Harmeling, Michael Hirsch, Bernhard Schölkopf

We establish a link between Fourier optics and a recent construction from the machine learning community termed the kernel mean map. Using the Fraunhofer approximation, it identifies the kernel with the squared Fourier transform of the aperture. This allows us to use results about the invertibility of the kernel mean map to provide a statement about the invertibility of Fraunhofer diffraction, showing that imaging processes with arbitrarily small apertures can in principle be invertible, i.e., do not lose information, provided the objects to be imaged satisfy a generic condition. A real world experiment shows that we can super-resolve beyond the Rayleigh limit.