LGOct 9, 2023
Diagnosing Catastrophe: Large parts of accuracy loss in continual learning can be accounted for by readout misalignmentDaniel Anthes, Sushrut Thorat, Peter König et al.
Unlike primates, training artificial neural networks on changing data distributions leads to a rapid decrease in performance on old tasks. This phenomenon is commonly referred to as catastrophic forgetting. In this paper, we investigate the representational changes that underlie this performance decrease and identify three distinct processes that together account for the phenomenon. The largest component is a misalignment between hidden representations and readout layers. Misalignment occurs due to learning on additional tasks and causes internal representations to shift. Representational geometry is partially conserved under this misalignment and only a small part of the information is irrecoverably lost. All types of representational changes scale with the dimensionality of hidden representations. These insights have implications for deep learning applications that need to be continuously updated, but may also aid aligning ANN models to the rather robust biological vision.
LGOct 7, 2023
Keep Moving: identifying task-relevant subspaces to maximise plasticity for newly learned tasksDaniel Anthes, Sushrut Thorat, Peter König et al.
Continual learning algorithms strive to acquire new knowledge while preserving prior information. Often, these algorithms emphasise stability and restrict network updates upon learning new tasks. In many cases, such restrictions come at a cost to the model's plasticity, i.e. the model's ability to adapt to the requirements of a new task. But is all change detrimental? Here, we approach this question by proposing that activation spaces in neural networks can be decomposed into two subspaces: a readout range in which change affects prior tasks and a null space in which change does not alter prior performance. Based on experiments with this novel technique, we show that, indeed, not all activation change is associated with forgetting. Instead, only change in the subspace visible to the readout of a task can lead to decreased stability, while restricting change outside of this subspace is associated only with a loss of plasticity. Analysing various commonly used algorithms, we show that regularisation-based techniques do not fully disentangle the two spaces and, as a result, restrict plasticity more than need be. We expand our results by investigating a linear model in which we can manipulate learning in the two subspaces directly and thus causally link activation changes to stability and plasticity. For hierarchical, nonlinear cases, we present an approximation that enables us to estimate functionally relevant subspaces at every layer of a deep nonlinear network, corroborating our previous insights. Together, this work provides novel means to derive insights into the mechanisms behind stability and plasticity in continual learning and may serve as a diagnostic tool to guide developments of future continual learning algorithms that stabilise inference while allowing maximal space for learning.
HCDec 22, 2020Code
WestDrive X LoopAR: An open-access virtual reality project in Unity for evaluating user interaction methods during TORFarbod N. Nezami, Maximilian A. Wächter, Nora Maleki et al.
With the further development of highly automated vehicles, drivers will engage in non-related tasks while being driven. Still, drivers have to take over control when requested by the car. Here the question arises, how potentially distracted drivers get back into the control-loop quickly and safely when the car requests a takeover. To investigate effective human-machine interactions in mobile, versatile, and cost-efficient setup is needed. We developed a virtual reality toolkit for the Unity 3D game engine containing all necessary code and assets to enable fast adaptations to various human-machine interaction experiments, including close monitoring of the subject. The presented project contains all needed functionalities for realistic traffic behavior, cars, and pedestrians, as well as a large, open-source, scriptable, and modular VR environment. It covers roughly 25 square km, a package of 125 animated pedestrians and numerous vehicles, including motorbikes, trucks, and cars. It also contains all needed nature assets to make it both highly dynamic and realistic. The presented repository contains a C++ library made for LoopAR that enables force feedback for gaming steering wheels as a fully supported component. It also includes All necessary scripts for eye-tracking in the used devices. All main functions are integrated into the graphical user interface of the Unity Editor or are available as prefab variants to ease the use of the embedded functionalities. The primary purpose of this project is to serve as open access, cost-efficient toolkit that enables interested researchers to conduct realistic virtual reality research studies without costly and immobile simulators.
LGSep 28, 2025
Brain-language fusion enables interactive neural readout and in-silico experimentationVictoria Bosch, Daniel Anthes, Adrien Doerig et al.
Large language models (LLMs) have revolutionized human-machine interaction, and have been extended by embedding diverse modalities such as images into a shared language space. Yet, neural decoding has remained constrained by static, non-interactive methods. We introduce CorText, a framework that integrates neural activity directly into the latent space of an LLM, enabling open-ended, natural language interaction with brain data. Trained on fMRI data recorded during viewing of natural scenes, CorText generates accurate image captions and can answer more detailed questions better than controls, while having access to neural data only. We showcase that CorText achieves zero-shot generalization beyond semantic categories seen during training. Furthermore, we present a counterfactual analysis that emulates in-silico cortical microstimulation. These advances mark a shift from passive decoding toward generative, flexible interfaces between brain activity and language.
LGFeb 3, 2021
Fast Concept Mapping: The Emergence of Human Abilities in Artificial Neural Networks when Learning Embodied and Self-SupervisedViviane Clay, Peter König, Gordon Pipa et al.
Most artificial neural networks used for object detection and recognition are trained in a fully supervised setup. This is not only very resource consuming as it requires large data sets of labeled examples but also very different from how humans learn. We introduce a setup in which an artificial agent first learns in a simulated world through self-supervised exploration. Following this, the representations learned through interaction with the world can be used to associate semantic concepts such as different types of doors. To do this, we use a method we call fast concept mapping which uses correlated firing patterns of neurons to define and detect semantic concepts. This association works instantaneous with very few labeled examples, similar to what we observe in humans in a phenomenon called fast mapping. Strikingly, this method already identifies objects with as little as one labeled example which highlights the quality of the encoding learned self-supervised through embodiment using curiosity-driven exploration. It therefor presents a feasible strategy for learning concepts without much supervision and shows that through pure interaction with the world meaningful representations of an environment can be learned.
CVSep 24, 2019
Enhancing Traffic Scene Predictions with Generative Adversarial NetworksPeter König, Sandra Aigner, Marco Körner
We present a new two-stage pipeline for predicting frames of traffic scenes where relevant objects can still reliably be detected. Using a recent video prediction network, we first generate a sequence of future frames based on past frames. A second network then enhances these frames in order to make them appear more realistic. This ensures the quality of the predicted frames to be sufficient to enable accurate detection of objects, which is especially important for autonomously driving cars. To verify this two-stage approach, we conducted experiments on the Cityscapes dataset. For enhancing, we trained two image-to-image translation methods based on generative adversarial networks, one for blind motion deblurring and one for image super-resolution. All resulting predictions were quantitatively evaluated using both traditional metrics and a state-of-the-art object detection network showing that the enhanced frames appear qualitatively improved. While the traditional image comparison metrics, i.e., MSE, PSNR, and SSIM, failed to confirm this visual impression, the object detection evaluation resembles it well. The best performing prediction-enhancement pipeline is able to increase the average precision values for detecting cars by about 9% for each prediction step, compared to the non-enhanced predictions.
CVJun 26, 2019
Further advantages of data augmentation on convolutional neural networksAlex Hernández-García, Peter König
Data augmentation is a popular technique largely used to enhance the training of convolutional neural networks. Although many of its benefits are well known by deep learning researchers and practitioners, its implicit regularization effects, as compared to popular explicit regularization techniques, such as weight decay and dropout, remain largely unstudied. As a matter of fact, convolutional neural networks for image object classification are typically trained with both data augmentation and explicit regularization, assuming the benefits of all techniques are complementary. In this paper, we systematically analyze these techniques through ablation studies of different network architectures trained with different amounts of training data. Our results unveil a largely ignored advantage of data augmentation: networks trained with just data augmentation more easily adapt to different architectures and amount of training data, as opposed to weight decay and dropout, which require specific fine-tuning of their hyperparameters.
CVJun 11, 2019
Learning robust visual representations using data augmentation invarianceAlex Hernández-García, Peter König, Tim C. Kietzmann
Deep convolutional neural networks trained for image object categorization have shown remarkable similarities with representations found across the primate ventral visual stream. Yet, artificial and biological networks still exhibit important differences. Here we investigate one such property: increasing invariance to identity-preserving image transformations found along the ventral stream. Despite theoretical evidence that invariance should emerge naturally from the optimization process, we present empirical evidence that the activations of convolutional neural networks trained for object categorization are not robust to identity-preserving image transformations commonly used in data augmentation. As a solution, we propose data augmentation invariance, an unsupervised learning objective which improves the robustness of the learned representations by promoting the similarity between the activations of augmented image samples. Our results show that this approach is a simple, yet effective and efficient (10 % increase in training time) way of increasing the invariance of the models while obtaining similar categorization performance.
CVJun 11, 2018
Data augmentation instead of explicit regularizationAlex Hernández-García, Peter König
Contrary to most machine learning models, modern deep artificial neural networks typically include multiple components that contribute to regularization. Despite the fact that some (explicit) regularization techniques, such as weight decay and dropout, require costly fine-tuning of sensitive hyperparameters, the interplay between them and other elements that provide implicit regularization is not well understood yet. Shedding light upon these interactions is key to efficiently using computational resources and may contribute to solving the puzzle of generalization in deep learning. Here, we first provide formal definitions of explicit and implicit regularization that help understand essential differences between techniques. Second, we contrast data augmentation with weight decay and dropout. Our results show that visual object categorization models trained with data augmentation alone achieve the same performance or higher than models trained also with weight decay and dropout, as is common practice. We conclude that the contribution on generalization of weight decay and dropout is not only superfluous when sufficient implicit regularization is provided, but also such techniques can dramatically deteriorate the performance if the hyperparameters are not carefully tuned for the architecture and data set. In contrast, data augmentation systematically provides large generalization gains and does not require hyperparameter re-tuning. In view of our results, we suggest to optimize neural networks without weight decay and dropout to save computational resources, hence carbon emissions, and focus more on data augmentation and other inductive biases to improve performance and robustness.
CVFeb 20, 2018
Do deep nets really need weight decay and dropout?Alex Hernández-García, Peter König
The impressive success of modern deep neural networks on computer vision tasks has been achieved through models of very large capacity compared to the number of available training examples. This overparameterization is often said to be controlled with the help of different regularization techniques, mainly weight decay and dropout. However, since these techniques reduce the effective capacity of the model, typically even deeper and wider architectures are required to compensate for the reduced capacity. Therefore, there seems to be a waste of capacity in this practice. In this paper we build upon recent research that suggests that explicit regularization may not be as important as widely believed and carry out an ablation study that concludes that weight decay and dropout may not be necessary for object recognition if enough data augmentation is introduced.