LGApr 5, 2022
Complex-Valued Autoencoders for Object DiscoverySindy Löwe, Phillip Lippe, Maja Rudolph et al.
Object-centric representations form the basis of human perception, and enable us to reason about the world and to systematically generalize to new settings. Currently, most works on unsupervised object discovery focus on slot-based approaches, which explicitly separate the latent representations of individual objects. While the result is easily interpretable, it usually requires the design of involved architectures. In contrast to this, we propose a comparatively simple approach - the Complex AutoEncoder (CAE) - that creates distributed object-centric representations. Following a coding scheme theorized to underlie object representations in biological neurons, its complex-valued activations represent two messages: their magnitudes express the presence of a feature, while the relative phase differences between neurons express which features should be bound together to create joint object representations. In contrast to previous approaches using complex-valued activations for object discovery, we present a fully unsupervised approach that is trained end-to-end - resulting in significant improvements in performance and efficiency. Further, we show that the CAE achieves competitive or better unsupervised object discovery performance on simple multi-object datasets compared to a state-of-the-art slot-based approach while being up to 100 times faster to train.
LGJun 1, 2023
Rotating Features for Object DiscoverySindy Löwe, Phillip Lippe, Francesco Locatello et al.
The binding problem in human cognition, concerning how the brain represents and connects objects within a fixed network of neural connections, remains a subject of intense debate. Most machine learning efforts addressing this issue in an unsupervised setting have focused on slot-based methods, which may be limiting due to their discrete nature and difficulty to express uncertainty. Recently, the Complex AutoEncoder was proposed as an alternative that learns continuous and distributed object-centric representations. However, it is only applicable to simple toy data. In this paper, we present Rotating Features, a generalization of complex-valued features to higher dimensions, and a new evaluation procedure for extracting objects from distributed representations. Additionally, we show the applicability of our approach to pre-trained features. Together, these advancements enable us to scale distributed object-centric representations from simple toy to real-world data. We believe this work advances a new paradigm for addressing the binding problem in machine learning and has the potential to inspire further innovation in the field.
LGJun 13, 2022
Causal Representation Learning for Instantaneous and Temporal Effects in Interactive SystemsPhillip Lippe, Sara Magliacane, Sindy Löwe et al.
Causal representation learning is the task of identifying the underlying causal variables and their relations from high-dimensional observations, such as images. Recent work has shown that one can reconstruct the causal variables from temporal sequences of observations under the assumption that there are no instantaneous causal relations between them. In practical applications, however, our measurement or frame rate might be slower than many of the causal effects. This effectively creates "instantaneous" effects and invalidates previous identifiability results. To address this issue, we propose iCITRIS, a causal representation learning method that allows for instantaneous effects in intervened temporal sequences when intervention targets can be observed, e.g., as actions of an agent. iCITRIS identifies the potentially multidimensional causal variables from temporal observations, while simultaneously using a differentiable causal discovery method to learn their causal graph. In experiments on three datasets of interactive systems, iCITRIS accurately identifies the causal variables and their causal graph.
LGJun 16, 2023
BISCUIT: Causal Representation Learning from Binary InteractionsPhillip Lippe, Sara Magliacane, Sindy Löwe et al.
Identifying the causal variables of an environment and how to intervene on them is of core value in applications such as robotics and embodied AI. While an agent can commonly interact with the environment and may implicitly perturb the behavior of some of these causal variables, often the targets it affects remain unknown. In this paper, we show that causal variables can still be identified for many common setups, e.g., additive Gaussian noise models, if the agent's interactions with a causal variable can be described by an unknown binary variable. This happens when each causal variable has two different mechanisms, e.g., an observational and an interventional one. Using this identifiability result, we propose BISCUIT, a method for simultaneously learning causal variables and their corresponding binary interaction variables. On three robotic-inspired datasets, BISCUIT accurately identifies causal variables and can even be scaled to complex, realistic environments for embodied AI.
CVNov 28, 2023
Image segmentation with traveling waves in an exactly solvable recurrent neural networkLuisa H. B. Liboni, Roberto C. Budzinski, Alexandra N. Busch et al.
We study image segmentation using spatiotemporal dynamics in a recurrent neural network where the state of each unit is given by a complex number. We show that this network generates sophisticated spatiotemporal dynamics that can effectively divide an image into groups according to a scene's structural characteristics. Using an exact solution of the recurrent network's dynamics, we present a precise description of the mechanism underlying object segmentation in this network, providing a clear mathematical interpretation of how the network performs this task. We then demonstrate a simple algorithm for object segmentation that generalizes across inputs ranging from simple geometric objects in grayscale images to natural images. Object segmentation across all images is accomplished with one recurrent neural network that has a single, fixed set of weights. This demonstrates the expressive potential of recurrent neural networks when constructed using a mathematical approach that brings together their structure, dynamics, and computation.
LGOct 17, 2024Code
Artificial Kuramoto Oscillatory NeuronsTakeru Miyato, Sindy Löwe, Andreas Geiger et al.
It has long been known in both neuroscience and AI that ``binding'' between neurons leads to a form of competitive learning where representations are compressed in order to represent more abstract concepts in deeper layers of the network. More recently, it was also hypothesized that dynamic (spatiotemporal) representations play an important role in both neuroscience and AI. Building on these ideas, we introduce Artificial Kuramoto Oscillatory Neurons (AKOrN) as a dynamical alternative to threshold units, which can be combined with arbitrary connectivity designs such as fully connected, convolutional, or attentive mechanisms. Our generalized Kuramoto updates bind neurons together through their synchronization dynamics. We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, calibrated uncertainty quantification, and reasoning. We believe that these empirical results show the importance of rethinking our assumptions at the most basic neuronal level of neural representation, and in particular show the importance of dynamical representations. Code:https://github.com/autonomousvision/akorn Project page:https://takerum.github.io/akorn_project_page/
LGApr 27
BaLoRA: Bayesian Low-Rank Adaptation of Large Scale ModelsDario Coscia, Sindy Löwe, Max Welling
Low-Rank Adaptation (LoRA) has become the standard for fine-tuning large pre-trained models at reduced computational cost. However, its low-rank point-estimate updates limit expressiveness, leave a persistent gap relative to full fine-tuning accuracy, and provide no built-in uncertainty quantification, limiting its applicability in settings where reliability matters as much as accuracy. We introduce BaLoRA, a Bayesian extension of LoRA with a novel input-adaptive Bayesian parameterization of LoRA matrices that adds minimal parameters and compute. Surprisingly, not only does the Bayesian extension yield well-calibrated uncertainty estimates, but the adaptive noise injection underlying our approach also significantly improves prediction accuracy, narrowing the gap with full fine-tuning across both natural language reasoning and vision tasks. When applied to band gap prediction in metal-organic frameworks, BaLoRA produces zero-shot test-time uncertainty estimates that correlate more strongly with model error than a trained ensemble of LoRA models, and improve monotonically with compute without sacrificing accuracy.
MTRL-SCIAug 5, 2025
The Open DAC 2025 Dataset for Sorbent Discovery in Direct Air CaptureAnuroop Sriram, Logan M. Brabson, Xiaohan Yu et al. · baidu, cmu
Identifying useful sorbent materials for direct air capture (DAC) from humid air remains a challenge. We present the Open DAC 2025 (ODAC25) dataset, a significant expansion and improvement upon ODAC23 (Sriram et al., ACS Central Science, 10 (2024) 923), comprising nearly 60 million DFT single-point calculations for CO$_2$, H$_2$O, N$_2$, and O$_2$ adsorption in 15,000 MOFs. ODAC25 introduces chemical and configurational diversity through functionalized MOFs, high-energy GCMC-derived placements, and synthetically generated frameworks. ODAC25 also significantly improves upon the accuracy of DFT calculations and the treatment of flexible MOFs in ODAC23. Along with the dataset, we release new state-of-the-art machine-learned interatomic potentials trained on ODAC25 and evaluate them on adsorption energy and Henry's law coefficient predictions.
LGFeb 8, 2024
Binding Dynamics in Rotating FeaturesSindy Löwe, Francesco Locatello, Max Welling
In human cognition, the binding problem describes the open question of how the brain flexibly integrates diverse information into cohesive object representations. Analogously, in machine learning, there is a pursuit for models capable of strong generalization and reasoning by learning object-centric representations in an unsupervised manner. Drawing from neuroscientific theories, Rotating Features learn such representations by introducing vector-valued features that encapsulate object characteristics in their magnitudes and object affiliation in their orientations. The "$χ$-binding" mechanism, embedded in every layer of the architecture, has been shown to be crucial, but remains poorly understood. In this paper, we propose an alternative "cosine binding" mechanism, which explicitly computes the alignment between features and adjusts weights accordingly, and we show that it achieves equivalent performance. This allows us to draw direct connections to self-attention and biological neural processes, and to shed light on the fundamental dynamics for object-centric representations to emerge in Rotating Features.
LGFeb 7, 2022
CITRIS: Causal Identifiability from Temporal Intervened SequencesPhillip Lippe, Sara Magliacane, Sindy Löwe et al.
Understanding the latent causal factors of a dynamical system from visual observations is considered a crucial step towards agents reasoning in complex environments. In this paper, we propose CITRIS, a variational autoencoder framework that learns causal representations from temporal sequences of images in which underlying causal factors have possibly been intervened upon. In contrast to the recent literature, CITRIS exploits temporality and observing intervention targets to identify scalar and multidimensional causal factors, such as 3D rotation angles. Furthermore, by introducing a normalizing flow, CITRIS can be easily extended to leverage and disentangle representations obtained by already pretrained autoencoders. Extending previous results on scalar causal factors, we prove identifiability in a more general setting, in which only some components of a causal factor are affected by interventions. In experiments on 3D rendered image sequences, CITRIS outperforms previous methods on recovering the underlying causal variables. Moreover, using pretrained autoencoders, CITRIS can even generalize to unseen instantiations of causal factors, opening future research areas in sim-to-real generalization for causal representation learning.
CVJul 16, 2021
Contrastive Predictive Coding for Anomaly DetectionPuck de Haan, Sindy Löwe
Reliable detection of anomalies is crucial when deploying machine learning models in practice, but remains challenging due to the lack of labeled data. To tackle this challenge, contrastive learning approaches are becoming increasingly popular, given the impressive results they have achieved in self-supervised representation learning settings. However, while most existing contrastive anomaly detection and segmentation approaches have been applied to images, none of them can use the contrastive losses directly for both anomaly detection and segmentation. In this paper, we close this gap by making use of the Contrastive Predictive Coding model (arXiv:1807.03748). We show that its patch-wise contrastive loss can directly be interpreted as an anomaly score, and how this allows for the creation of anomaly segmentation masks. The resulting model achieves promising results for both anomaly detection and segmentation on the challenging MVTec-AD dataset.
CVNov 20, 2020
Learning Object-Centric Video Models by Contrasting SetsSindy Löwe, Klaus Greff, Rico Jonschkowski et al.
Contrastive, self-supervised learning of object representations recently emerged as an attractive alternative to reconstruction-based training. Prior approaches focus on contrasting individual object representations (slots) against one another. However, a fundamental problem with this approach is that the overall contrastive loss is the same for (i) representing a different object in each slot, as it is for (ii) (re-)representing the same object in all slots. Thus, this objective does not inherently push towards the emergence of object-centric representations in the slots. We address this problem by introducing a global, set-based contrastive loss: instead of contrasting individual slot representations against one another, we aggregate the representations and contrast the joined sets against one another. Additionally, we introduce attention-based encoders to this contrastive setup which simplifies training and provides interpretable object masks. Our results on two synthetic video datasets suggest that this approach compares favorably against previous contrastive methods in terms of reconstruction, future prediction and object separation performance.
LGJun 18, 2020
Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series DataSindy Löwe, David Madras, Richard Zemel et al.
On time-series data, most causal discovery methods fit a new model whenever they encounter samples from a new underlying causal graph. However, these samples often share relevant information which is lost when following this approach. Specifically, different samples may share the dynamics which describe the effects of their causal relations. We propose Amortized Causal Discovery, a novel framework that leverages such shared dynamics to learn to infer causal relations from time-series data. This enables us to train a single, amortized model that infers causal relations across samples with different underlying causal graphs, and thus leverages the shared dynamics information. We demonstrate experimentally that this approach, implemented as a variational model, leads to significant improvements in causal discovery performance, and show how it can be extended to perform well under added noise and hidden confounding.
LGMay 28, 2019
Putting An End to End-to-End: Gradient-Isolated Learning of RepresentationsSindy Löwe, Peter O'Connor, Bastiaan S. Veeling
We propose a novel deep learning method for local self-supervised representation learning that does not require labels nor end-to-end backpropagation but exploits the natural order in data instead. Inspired by the observation that biological neural networks appear to learn without backpropagating a global error signal, we split a deep neural network into a stack of gradient-isolated modules. Each module is trained to maximally preserve the information of its inputs using the InfoNCE bound from Oord et al. [2018]. Despite this greedy training, we demonstrate that each module improves upon the output of its predecessor, and that the representations created by the top module yield highly competitive results on downstream classification tasks in the audio and visual domain. The proposal enables optimizing modules asynchronously, allowing large-scale distributed training of very deep neural networks on unlabelled datasets.
CVJul 5, 2018
Improving Unsupervised Defect Segmentation by Applying Structural Similarity to AutoencodersPaul Bergmann, Sindy Löwe, Michael Fauser et al.
Convolutional autoencoders have emerged as popular methods for unsupervised defect segmentation on image data. Most commonly, this task is performed by thresholding a pixel-wise reconstruction error based on an $\ell^p$ distance. This procedure, however, leads to large residuals whenever the reconstruction encompasses slight localization inaccuracies around edges. It also fails to reveal defective regions that have been visually altered when intensity values stay roughly consistent. We show that these problems prevent these approaches from being applied to complex real-world scenarios and that it cannot be easily avoided by employing more elaborate architectures such as variational or feature matching autoencoders. We propose to use a perceptual loss function based on structural similarity which examines inter-dependencies between local image regions, taking into account luminance, contrast and structural information, instead of simply comparing single pixel values. It achieves significant performance gains on a challenging real-world dataset of nanofibrous materials and a novel dataset of two woven fabrics over the state of the art approaches for unsupervised defect segmentation that use pixel-wise reconstruction error metrics.