LGJul 6, 2023
ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource AllocationAbhijeet Pendyala, Justin Dettmer, Tobias Glasmachers et al.
We present ContainerGym, a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task. The proposed benchmark encodes a range of challenges commonly encountered in real-world sequential decision making problems, such as uncertainty. It can be configured to instantiate problems of varying degrees of difficulty, e.g., in terms of variable dimensionality. Our benchmark differs from other reinforcement learning benchmarks, including the ones aiming to encode real-world difficulties, in that it is directly derived from a real-world industrial problem, which underwent minimal simplification and streamlining. It is sufficiently versatile to evaluate reinforcement learning algorithms on any real-world problem that fits our resource allocation framework. We provide results of standard baseline methods. Going beyond the usual training reward curves, our results and the statistical tools used to interpret them allow to highlight interesting limitations of well-known deep reinforcement learning algorithms, namely PPO, TRPO and DQN.
NCJun 21, 2022
ConTraNet: A single end-to-end hybrid network for EEG-based and EMG-based human machine interfacesOmair Ali, Muhammad Saif-ur-Rehman, Tobias Glasmachers et al.
Objective: Electroencephalography (EEG) and electromyography (EMG) are two non-invasive bio-signals, which are widely used in human machine interface (HMI) technologies (EEG-HMI and EMG-HMI paradigm) for the rehabilitation of physically disabled people. Successful decoding of EEG and EMG signals into respective control command is a pivotal step in the rehabilitation process. Recently, several Convolutional neural networks (CNNs) based architectures are proposed that directly map the raw time-series signal into decision space and the process of meaningful features extraction and classification are performed simultaneously. However, these networks are tailored to the learn the expected characteristics of the given bio-signal and are limited to single paradigm. In this work, we addressed the question that can we build a single architecture which is able to learn distinct features from different HMI paradigms and still successfully classify them. Approach: In this work, we introduce a single hybrid model called ConTraNet, which is based on CNN and Transformer architectures that is equally useful for EEG-HMI and EMG-HMI paradigms. ConTraNet uses CNN block to introduce inductive bias in the model and learn local dependencies, whereas the Transformer block uses the self-attention mechanism to learn the long-range dependencies in the signal, which are crucial for the classification of EEG and EMG signals. Main results: We evaluated and compared the ConTraNet with state-of-the-art methods on three publicly available datasets which belong to EEG-HMI and EMG-HMI paradigms. ConTraNet outperformed its counterparts in all the different category tasks (2-class, 3-class, 4-class, and 10-class decoding tasks). Significance: The results suggest that ConTraNet is robust to learn distinct features from different HMI paradigms and generalizes well as compared to the current state of the art algorithms.
LGJul 3, 2022
Recipe for Fast Large-scale SVM Training: Polishing, Parallelism, and more RAM!Tobias Glasmachers
Support vector machines (SVMs) are a standard method in the machine learning toolbox, in particular for tabular data. Non-linear kernel SVMs often deliver highly accurate predictors, however, at the cost of long training times. That problem is aggravated by the exponential growth of data volumes over time. It was tackled in the past mainly by two types of techniques: approximate solvers, and parallel GPU implementations. In this work, we combine both approaches to design an extremely fast dual SVM solver. We fully exploit the capabilities of modern compute servers: many-core architectures, multiple high-end GPUs, and large random access memory. On such a machine, we train a large-margin classifier on the ImageNet data set in 24 minutes.
LGOct 16, 2023
Leveraging Topological Maps in Deep Reinforcement Learning for Multi-Object NavigationSimon Hakenes, Tobias Glasmachers
This work addresses the challenge of navigating expansive spaces with sparse rewards through Reinforcement Learning (RL). Using topological maps, we elevate elementary actions to object-oriented macro actions, enabling a simple Deep Q-Network (DQN) agent to solve otherwise practically impossible environments.
LGAug 1, 2023
Understanding Activation Patterns in Artificial Neural Networks by Exploring Stochastic ProcessesStephan Johann Lehmler, Muhammad Saif-ur-Rehman, Tobias Glasmachers et al.
To gain a deeper understanding of the behavior and learning dynamics of (deep) artificial neural networks, it is valuable to employ mathematical abstractions and models. These tools provide a simplified perspective on network performance and facilitate systematic investigations through simulations. In this paper, we propose utilizing the framework of stochastic processes, which has been underutilized thus far. Our approach models activation patterns of thresholded nodes in (deep) artificial neural networks as stochastic processes. We focus solely on activation frequency, leveraging neuroscience techniques used for real neuron spike trains. During a classification task, we extract spiking activity and use an arrival process following the Poisson distribution. We examine observed data from various artificial neural networks in image recognition tasks, fitting the proposed model's assumptions. Through this, we derive parameters describing activation patterns in each network. Our analysis covers randomly initialized, generalizing, and memorizing networks, revealing consistent differences across architectures and training sets. Calculating Mean Firing Rate, Mean Fano Factor, and Variances, we find stable indicators of memorization during learning, providing valuable insights into network behavior. The proposed model shows promise in describing activation patterns and could serve as a general framework for future investigations. It has potential applications in theoretical simulations, pruning, and transfer learning.
LGDec 29, 2022
Invariance to Quantile Selection in Distributional Continuous ControlFelix Grün, Muhammad Saif-ur-Rehman, Tobias Glasmachers et al.
In recent years distributional reinforcement learning has produced many state of the art results. Increasingly sample efficient Distributional algorithms for the discrete action domain have been developed over time that vary primarily in the way they parameterize their approximations of value distributions, and how they quantify the differences between those distributions. In this work we transfer three of the most well-known and successful of those algorithms (QR-DQN, IQN and FQF) to the continuous action domain by extending two powerful actor-critic algorithms (TD3 and SAC) with distributional critics. We investigate whether the relative performance of the methods for the discrete action space translates to the continuous case. To that end we compare them empirically on the pybullet implementations of a set of continuous control tasks. Our results indicate qualitative invariance regarding the number and placement of distributional atoms in the deterministic, continuous action setting.
NEMar 23
Evolutionary Warm-Starts for Reinforcement Learning in Industrial Continuous ControlTom Maus, Stephan Frank, Tobias Glasmachers
Reinforcement learning (RL) is still rarely applied in industrial control, partly due to the difficulty of training reliable agents for real-world conditions. This work investigates how evolution strategies can support RL in such settings by introducing a continuous-control adaptation of an industrial sorting benchmark. The CMA-ES algorithm is used to generate high-quality demonstrations that warm-start RL agents. Results show that CMA-ES-guided initialization significantly improves stability and performance. Furthermore, the demonstration trajectories generated with the CMA-ES provide a strong oracle reference performance level, which is of interest in its own right. The study delivers a focused proof of concept for hybrid evolutionary-RL approaches and a basis for future, more complex industrial applications.
LGJul 1, 2025
Leveraging Genetic Algorithms for Efficient Demonstration Generation in Real-World Reinforcement Learning EnvironmentsTom Maus, Asma Atamna, Tobias Glasmachers
Reinforcement Learning (RL) has demonstrated significant potential in certain real-world industrial applications, yet its broader deployment remains limited by inherent challenges such as sample inefficiency and unstable learning dynamics. This study investigates the utilization of Genetic Algorithms (GAs) as a mechanism for improving RL performance in an industrially inspired sorting environment. We propose a novel approach in which GA-generated expert demonstrations are used to enhance policy learning. These demonstrations are incorporated into a Deep Q-Network (DQN) replay buffer for experience-based learning and utilized as warm-start trajectories for Proximal Policy Optimization (PPO) agents to accelerate training convergence. Our experiments compare standard RL training with rule-based heuristics, brute-force optimization, and demonstration data, revealing that GA-derived demonstrations significantly improve RL performance. Notably, PPO agents initialized with GA-generated data achieved superior cumulative rewards, highlighting the potential of hybrid learning paradigms, where heuristic search methods complement data-driven RL. The utilized framework is publicly available and enables further research into adaptive RL strategies for real-world applications.
LGApr 3, 2024
Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward EngineeringAbhijeet Pendyala, Asma Atamna, Tobias Glasmachers
We present a proximal policy optimization (PPO) agent trained through curriculum learning (CL) principles and meticulous reward engineering to optimize a real-world high-throughput waste sorting facility. Our work addresses the challenge of effectively balancing the competing objectives of operational safety, volume optimization, and minimizing resource usage. A vanilla agent trained from scratch on these multiple criteria fails to solve the problem due to its inherent complexities. This problem is particularly difficult due to the environment's extremely delayed rewards with long time horizons and class (or action) imbalance, with important actions being infrequent in the optimal policy. This forces the agent to anticipate long-term action consequences and prioritize rare but rewarding behaviours, creating a non-trivial reinforcement learning task. Our five-stage CL approach tackles these challenges by gradually increasing the complexity of the environmental dynamics during policy transfer while simultaneously refining the reward mechanism. This iterative and adaptable process enables the agent to learn a desired optimal policy. Results demonstrate that our approach significantly improves inference-time safety, achieving near-zero safety violations in addition to enhancing waste sorting plant efficiency.
CVFeb 29, 2024
ProtoP-OD: Explainable Object Detection with Prototypical PartsPavlos Rath-Manakidis, Frederik Strothmann, Tobias Glasmachers et al.
Interpretation and visualization of the behavior of detection transformers tends to highlight the locations in the image that the model attends to, but it provides limited insight into the \emph{semantics} that the model is focusing on. This paper introduces an extension to detection transformers that constructs prototypical local features and uses them in object detection. These custom features, which we call prototypical parts, are designed to be mutually exclusive and align with the classifications of the model. The proposed extension consists of a bottleneck module, the prototype neck, that computes a discretized representation of prototype activations and a new loss term that matches prototypes to object classes. This setup leads to interpretable representations in the prototype neck, allowing visual inspection of the image content perceived by the model and a better understanding of the model's reliability. We show experimentally that our method incurs only a limited performance penalty, and we provide examples that demonstrate the quality of the explanations provided by our method, which we argue outweighs the performance penalty.
LGApr 25, 2025
Deep Reinforcement Learning Based Navigation with Macro Actions and Topological MapsSimon Hakenes, Tobias Glasmachers
This paper addresses the challenge of navigation in large, visually complex environments with sparse rewards. We propose a method that uses object-oriented macro actions grounded in a topological map, allowing a simple Deep Q-Network (DQN) to learn effective navigation policies. The agent builds a map by detecting objects from RGBD input and selecting discrete macro actions that correspond to navigating to these objects. This abstraction drastically reduces the complexity of the underlying reinforcement learning problem and enables generalization to unseen environments. We evaluate our approach in a photorealistic 3D simulation and show that it significantly outperforms a random baseline under both immediate and terminal reward conditions. Our results demonstrate that topological structure and macro-level abstraction can enable sample-efficient learning even from pixel data.
LGMar 13, 2025
SortingEnv: An Extendable RL-Environment for an Industrial Sorting ProcessTom Maus, Nico Zengeler, Tobias Glasmachers
We present a novel reinforcement learning (RL) environment designed to both optimize industrial sorting systems and study agent behavior in evolving spaces. In simulating material flow within a sorting process our environment follows the idea of a digital twin, with operational parameters like belt speed and occupancy level. To reflect real-world challenges, we integrate common upgrades to industrial setups, like new sensors or advanced machinery. It thus includes two variants: a basic version focusing on discrete belt speed adjustments and an advanced version introducing multiple sorting modes and enhanced material composition observations. We detail the observation spaces, state update mechanisms, and reward functions for both environments. We further evaluate the efficiency of common RL algorithms like Proximal Policy Optimization (PPO), Deep-Q-Networks (DQN), and Advantage Actor Critic (A2C) in comparison to a classical rule-based agent (RBA). This framework not only aids in optimizing industrial processes but also provides a foundation for studying agent behavior and transferability in evolving environments, offering insights into model performance and practical implications for real-world RL applications.
NEDec 16, 2024
Deep-learning-based identification of individual motion characteristics from upper-limb trajectories towards disorder stage evaluationTim Sziburis, Susanne Blex, Tobias Glasmachers et al.
The identification of individual movement characteristics sets the foundation for the assessment of personal rehabilitation progress and can provide diagnostic information on levels and stages of movement disorders. This work presents a preliminary study for differentiating individual motion patterns using a dataset of 3D upper-limb transport trajectories measured in task-space. Identifying individuals by deep time series learning can be a key step to abstracting individual motion properties. In this study, a classification accuracy of about 95% is reached for a subset of nine, and about 78% for the full set of 31 individuals. This provides insights into the separability of patient attributes by exerting a simple standardized task to be transferred to portable systems.
LGOct 23, 2025
Balancing Specialization and Centralization: A Multi-Agent Reinforcement Learning Benchmark for Sequential Industrial ControlTom Maus, Asma Atamna, Tobias Glasmachers
Autonomous control of multi-stage industrial processes requires both local specialization and global coordination. Reinforcement learning (RL) offers a promising approach, but its industrial adoption remains limited due to challenges such as reward design, modularity, and action space management. Many academic benchmarks differ markedly from industrial control problems, limiting their transferability to real-world applications. This study introduces an enhanced industry-inspired benchmark environment that combines tasks from two existing benchmarks, SortingEnv and ContainerGym, into a sequential recycling scenario with sorting and pressing operations. We evaluate two control strategies: a modular architecture with specialized agents and a monolithic agent governing the full system, while also analyzing the impact of action masking. Our experiments show that without action masking, agents struggle to learn effective policies, with the modular architecture performing better. When action masking is applied, both architectures improve substantially, and the performance gap narrows considerably. These results highlight the decisive role of action space constraints and suggest that the advantages of specialization diminish as action complexity is reduced. The proposed benchmark thus provides a valuable testbed for exploring practical and robust multi-agent RL solutions in industrial automation, while contributing to the ongoing debate on centralization versus specialization.
LGAug 7, 2025
Cumulative Learning Rate Adaptation: Revisiting Path-Based Schedules for SGD and AdamAsma Atamna, Tom Maus, Fabian Kievelitz et al.
The learning rate is a crucial hyperparameter in deep learning, with its ideal value depending on the problem and potentially changing during training. In this paper, we investigate the practical utility of adaptive learning rate mechanisms that adjust step sizes dynamically in response to the loss landscape. We revisit a cumulative path-based adaptation scheme proposed in 2017, which adjusts the learning rate based on the discrepancy between the observed path length, computed as a time-discounted sum of normalized gradient steps, and the expected length of a random walk. While the original approach offers a compelling intuition, we show that its adaptation mechanism for Adam is conceptually inconsistent due to the optimizer's internal preconditioning. We propose a corrected variant that better reflects Adam's update dynamics. To assess the practical value of online learning rate adaptation, we benchmark SGD and Adam, with and without cumulative adaptation, and compare them to a recent alternative method. Our results aim to clarify when and why such adaptive strategies offer practical benefits.
LGMar 21, 2025
Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management ProblemAbhijeet Pendyala, Tobias Glasmachers
In this work, we augment reinforcement learning with an inference-time collision model to ensure safe and efficient container management in a waste-sorting facility with limited processing capacity. Each container has two optimal emptying volumes that trade off higher throughput against overflow risk. Conventional reinforcement learning (RL) approaches struggle under delayed rewards, sparse critical events, and high-dimensional uncertainty -- failing to consistently balance higher-volume empties with the risk of safety-limit violations. To address these challenges, we propose a hybrid method comprising: (1) a curriculum-learning pipeline that incrementally trains a PPO agent to handle delayed rewards and class imbalance, and (2) an offline pairwise collision model used at inference time to proactively avert collisions with minimal online cost. Experimental results show that our targeted inference-time collision checks significantly improve collision avoidance, reduce safety-limit violations, maintain high throughput, and scale effectively across varying container-to-PU ratios. These findings offer actionable guidelines for designing safe and efficient container-management systems in real-world facilities.
LGJan 27, 2022
From Motion to MuscleMarie D. Schmidt, Tobias Glasmachers, Ioannis Iossifidis
Voluntary human motion is the product of muscle activity that results from upstream motion planning of the motor cortical areas. We show that muscle activity can be artificially generated based on motion features such as position, velocity, and acceleration. For this purpose, we specifically develop an approach based on a recurrent neural network trained in a supervised learning session; additional neural network architectures are considered and evaluated. The performance is evaluated by a new score called the zero-line score. The latter adaptively rescales the loss function of the generated signal for all channels by comparing the overall range of muscle activity and thus dynamically evaluates similarities between both signals. The model achieves a remarkable precision for previously trained motion while new motions that were not trained before still have high accuracy. Further, these models are trained on multiple subjects and thus are able to generalize across individuals. In addition, we distinguish between a general model that has been trained on several subjects, a subject-specific model, and a specific pre-trained model that uses the general model as a basis and is adapted to a specific subject afterward. The subject-specific generation of muscle activity can be further exploited to improve the rehabilitation of neuromuscular diseases with myoelectric prostheses and functional electric stimulation.
LGDec 30, 2021
Deep Transfer-Learning for patient specific model re-calibration: Application to sEMG-ClassificationStephan Johann Lehmler, Muhammad Saif-ur-Rehman, Tobias Glasmachers et al.
Accurate decoding of surface electromyography (sEMG) is pivotal for muscle-to-machine-interfaces (MMI) and their application for e.g. rehabilitation therapy. sEMG signals have high inter-subject variability, due to various factors, including skin thickness, body fat percentage, and electrode placement. Therefore, obtaining high generalization quality of a trained sEMG decoder is quite challenging. Usually, machine learning based sEMG decoders are either trained on subject-specific data, or at least recalibrated for each user, individually. Even though, deep learning algorithms produced several state of the art results for sEMG decoding,however, due to the limited amount of availability of sEMG data, the deep learning models are prone to overfitting. Recently, transfer learning for domain adaptation improved generalization quality with reduced training time on various machine learning tasks. In this study, we investigate the effectiveness of transfer learning using weight initialization for recalibration of two different pretrained deep learning models on a new subjects data, and compare their performance to subject-specific models. To the best of our knowledge, this is the first study that thoroughly investigated weight-initialization based transfer learning for sEMG classification and compared transfer learning to subject-specific modeling. We tested our models on three publicly available databases under various settings. On average over all settings, our transfer learning approach improves 5~\%-points on the pretrained models without fine-tuning and 12~\%-points on the subject-specific models, while being trained on average 22~\% fewer epochs. Our results indicate that transfer learning enables faster training on fewer samples than user-specific models, and improves the performance of pretrained models as long as enough data is available.
NEDec 1, 2021
The (1+1)-ES Reliably Overcomes Saddle PointsTobias Glasmachers
It is known that step size adaptive evolution strategies (ES) do not converge (prematurely) to regular points of continuously differentiable objective functions. Among critical points, convergence to minima is desired, and convergence to maxima is easy to exclude. However, surprisingly little is known on whether ES can get stuck at a saddle point. In this work we establish that even the simple (1+1)-ES reliably overcomes most saddle points under quite mild regularity conditions. Our analysis is based on drift with tail bounds. It is non-standard in that we do not even aim to estimate hitting times based on drift. Rather, in our case it suffices to show that the relevant time is finite with full probability.
QMNov 30, 2020
Anchored-STFT and GNAA: An extension of STFT in conjunction with an adversarial data augmentation technique for the decoding of neural signalsOmair Ali, Muhammad Saif-ur-Rehman, Susanne Dyck et al.
Brain-computer interfaces (BCIs) enable communication between humans and machines by translating brain activity into control commands. Electroencephalography (EEG) signals are one of the most used brain signals in non-invasive BCI applications but are often contaminated with noise. Therefore, it is possible that meaningful patterns for classifying EEG signals are deeply hidden. State-of-the-art deep-learning algorithms are successful in learning hidden, meaningful patterns. However, the quality and the quantity of the presented inputs is pivotal. Here, we propose a novel feature extraction method called anchored Short Time Fourier Transform (anchored-STFT), which is an advanced version of STFT, as it minimizes the trade-off between temporal and spectral resolution presented by STFT. In addition, we propose a novel augmentation method, called gradient norm adversarial augmentation (GNAA). GNAA is not only an augmentation method but is also used to harness adversarial inputs in EEG data, which not only improves the classification accuracy but also enhances the robustness of the classifier. In addition, we also propose a new CNN architecture, namely Skip-Net, for the classification of EEG signals. The proposed pipeline outperforms all state-of-the-art methods and yields an average classification accuracy of 90.7 % and 89.54 % on BCI competition II dataset III and BCI competition IV dataset 2b, respectively.
OCNov 11, 2020
Non-local Optimization: Imposing Structure on Optimization Problems by RelaxationNils Müller, Tobias Glasmachers
In stochastic optimization, particularly in evolutionary computation and reinforcement learning, the optimization of a function $f: Ω\to \mathbb{R}$ is often addressed through optimizing a so-called relaxation $θ\in Θ\mapsto \mathbb{E}_θ(f)$ of $f$, where $Θ$ resembles the parameters of a family of probability measures on $Ω$. We investigate the structure of such relaxations by means of measure theory and Fourier analysis, enabling us to shed light on the success of many associated stochastic optimization methods. The main structural traits we derive and that allow fast and reliable optimization of relaxations are the consistency of optimal values of $f$, Lipschitzness of gradients, and convexity. We emphasize settings where $f$ itself is not differentiable or convex, e.g., in the presence of (stochastic) disturbance.
LGSep 20, 2020
Latent Representation Prediction NetworksHlynur Davíð Hlynsson, Merlin Schüler, Robin Schiewer et al.
Deeply-learned planning methods are often based on learning representations that are optimized for unrelated tasks. For example, they might be trained on reconstructing the environment. These representations are then combined with predictor functions for simulating rollouts to navigate the environment. We find this principle of learning representations unsatisfying and propose to learn them such that they are directly optimized for the task at hand: to be maximally predictable for the predictor function. This results in representations that are by design optimal for the downstream task of planning, where the learned predictor function is used as a forward model. To this end, we propose a new way of jointly learning this representation along with the prediction function, a system we dub Latent Representation Prediction Network (LARP). The prediction function is used as a forward model for search on a graph in a viewpoint-matching task and the representation learned to maximize predictability is found to outperform a pre-trained representation. Our approach is shown to be more sample-efficient than standard reinforcement learning methods and our learned representation transfers successfully to dissimilar objects.
CVSep 14, 2020
Methods of the Vehicle Re-identificationMohamed Nafzi, Michael Brauckmann, Tobias Glasmachers
Most of researchers use the vehicle re-identification based on classification. This always requires an update with the new vehicle models in the market. In this paper, two types of vehicle re-identification will be presented. First, the standard method, which needs an image from the search vehicle. VRIC and VehicleID data set are suitable for training this module. It will be explained in detail how to improve the performance of this method using a trained network, which is designed for the classification. The second method takes as input a representative image of the search vehicle with similar make/model, released year and colour. It is very useful when an image from the search vehicle is not available. It produces as output a shape and a colour features. This could be used by the matching across a database to re-identify vehicles, which look similar to the search vehicle. To get a robust module for the re-identification, a fine-grained classification has been trained, which its class consists of four elements: the make of a vehicle refers to the vehicle's manufacturer, e.g. Mercedes-Benz, the model of a vehicle refers to type of model within that manufacturer's portfolio, e.g. C Class, the year refers to the iteration of the model, which may receive progressive alterations and upgrades by its manufacturer and the perspective of the vehicle. Thus, all four elements describe the vehicle at increasing degree of specificity. The aim of the vehicle shape classification is to classify the combination of these four elements. The colour classification has been separately trained. The results of vehicle re-identification will be shown. Using a developed tool, the re-identification of vehicles on video images and on controlled data set will be demonstrated. This work was partially funded under the grant.
CVSep 14, 2020
Data Augmentation and Clustering for Vehicle Make/Model ClassificationMohamed Nafzi, Michael Brauckmann, Tobias Glasmachers
Vehicle shape information is very important in Intelligent Traffic Systems (ITS). In this paper we present a way to exploit a training data set of vehicles released in different years and captured under different perspectives. Also the efficacy of clustering to enhance the make/model classification is presented. Both steps led to improved classification results and a greater robustness. Deeper convolutional neural network based on ResNet architecture has been designed for the training of the vehicle make/model classification. The unequal class distribution of training data produces an a priori probability. Its elimination, obtained by removing of the bias and through hard normalization of the centroids in the classification layer, improves the classification results. A developed application has been used to test the vehicle re-identification on video data manually based on make/model and color classification. This work was partially funded under the grant.
OCSep 6, 2020
Convergence Analysis of the Hessian Estimation Evolution StrategyTobias Glasmachers, Oswin Krause
The class of algorithms called Hessian Estimation Evolution Strategies (HE-ESs) update the covariance matrix of their sampling distribution by directly estimating the curvature of the objective function. The approach is practically efficient, as attested by respectable performance on the BBOB testbed, even on rather irregular functions. In this paper we formally prove two strong guarantees for the (1+4)-HE-ES, a minimal elitist member of the family: stability of the covariance matrix update, and as a consequence, linear convergence on all convex quadratic problems at a rate that is independent of the problem instance.
LGApr 16, 2020
Analyzing Reinforcement Learning Benchmarks with Random Weight GuessingDeclan Oller, Tobias Glasmachers, Giuseppe Cuccu
We propose a novel method for analyzing and visualizing the complexity of standard reinforcement learning (RL) benchmarks based on score distributions. A large number of policy networks are generated by randomly guessing their parameters, and then evaluated on the benchmark task; the study of their aggregated results provide insights into the benchmark complexity. Our method guarantees objectivity of evaluation by sidestepping learning altogether: the policy network parameters are generated using Random Weight Guessing (RWG), making our method agnostic to (i) the classic RL setup, (ii) any learning algorithm, and (iii) hyperparameter tuning. We show that this approach isolates the environment complexity, highlights specific types of challenges, and provides a proper foundation for the statistical analysis of the task's difficulty. We test our approach on a variety of classic control benchmarks from the OpenAI Gym, where we show that small untrained networks can provide a robust baseline for a variety of tasks. The networks generated often show good performance even without gradual learning, incidentally highlighting the triviality of a few popular benchmarks.
LGMar 30, 2020
The Hessian Estimation Evolution StrategyTobias Glasmachers, Oswin Krause
We present a novel black box optimization algorithm called Hessian Estimation Evolution Strategy. The algorithm updates the covariance matrix of its sampling distribution by directly estimating the curvature of the objective function. This algorithm design is targeted at twice continuously differentiable problems. For this, we extend the cumulative step-size adaptation algorithm of the CMA-ES to mirrored sampling. We demonstrate that our approach to covariance matrix adaptation is efficient by evaluation it on the BBOB/COCO testbed. We also show that the algorithm is surprisingly robust when its core assumption of a twice continuously differentiable objective function is violated. The approach yields a new evolution strategy with competitive performance, and at the same time it also offers an interesting alternative to the usual covariance matrix update mechanism.
CVMay 15, 2019
Vehicle Shape and Color Classification Using Convolutional Neural NetworkMohamed Nafzi, Michael Brauckmann, Tobias Glasmachers
This paper presents a module of vehicle reidentification based on make/model and color classification. It could be used by the Automated Vehicular Surveillance (AVS) or by the fast analysis of video data. Many of problems, that are related to this topic, had to be addressed. In order to facilitate and accelerate the progress in this subject, we will present our way to collect and to label a large scale data set. We used deeper neural networks in our training. They showed a good classification accuracy. We show the results of make/model and color classification on controlled and video data set. We demonstrate with the help of a developed application the re-identification of vehicles on video images based on make/model and color classification. This work was partially funded under the grant.
NEOct 23, 2018
Challenges of Convex Quadratic Bi-objective Benchmark ProblemsTobias Glasmachers
Convex quadratic objective functions are an important base case in state-of-the-art benchmark collections for single-objective optimization on continuous domains. Although often considered rather simple, they represent the highly relevant challenges of non-separability and ill-conditioning. In the multi-objective case, quadratic benchmark problems are under-represented. In this paper we analyze the specific challenges that can be posed by quadratic functions in the bi-objective case. Our construction yields a full factorial design of 54 different problem classes. We perform experiments with well-established algorithms to demonstrate the insights that can be supported by this function class. We find huge performance differences, which can be clearly attributed to two root causes: non-separability and alignment of the Pareto set with the coordinate system.
LGJun 26, 2018
Dual SVM Training on a BudgetSahar Qaadan, Merlin Schüler, Tobias Glasmachers
We present a dual subspace ascent algorithm for support vector machine training that respects a budget constraint limiting the number of support vectors. Budget methods are effective for reducing the training time of kernel SVM while retaining high accuracy. To date, budget training is available only for primal (SGD-based) solvers. Dual subspace ascent methods like sequential minimal optimization are attractive for their good adaptation to the problem structure, their fast convergence rate, and their practical speed. By incorporating a budget constraint into a dual algorithm, our method enjoys the best of both worlds. We demonstrate considerable speed-ups over primal budget training methods.
LGJun 26, 2018
Speeding Up Budgeted Stochastic Gradient Descent SVM Training with Precomputed Golden Section SearchTobias Glasmachers, Sahar Qaadan
Limiting the model size of a kernel support vector machine to a pre-defined budget is a well-established technique that allows to scale SVM learning and prediction to large-scale data. Its core addition to simple stochastic gradient training is budget maintenance through merging of support vectors. This requires solving an inner optimization problem with an iterative method many times per gradient step. In this paper we replace the iterative procedure with a fast lookup. We manage to reduce the merging time by up to 65% and the total training time by 44% without any loss of accuracy.
LGJun 26, 2018
Multi-Merge Budget Maintenance for Stochastic Gradient Descent SVM TrainingSahar Qaadan, Tobias Glasmachers
Budgeted Stochastic Gradient Descent (BSGD) is a state-of-the-art technique for training large-scale kernelized support vector machines. The budget constraint is maintained incrementally by merging two points whenever the pre-defined budget is exceeded. The process of finding suitable merge partners is costly; it can account for up to 45% of the total training time. In this paper we investigate computationally more efficient schemes that merge more than two points at once. We obtain significant speed-ups without sacrificing accuracy.
NEJun 4, 2018
Challenges in High-dimensional Reinforcement Learning with Evolution StrategiesNils Müller, Tobias Glasmachers
Evolution Strategies (ESs) have recently become popular for training deep neural networks, in particular on reinforcement learning tasks, a special form of controller design. Compared to classic problems in continuous direct search, deep networks pose extremely high-dimensional optimization problems, with many thousands or even millions of variables. In addition, many control problems give rise to a stochastic fitness function. Considering the relevance of the application, we study the suitability of evolution strategies for high-dimensional, stochastic problems. Our results give insights into which algorithmic mechanisms of modern ES are of value for the class of problems at hand, and they reveal principled limitations of the approach. They are in line with our theoretical understanding of ESs. We show that combining ESs that offer reduced internal algorithm cost with uncertainty handling techniques yields promising methods for this class of problems.
NEFeb 9, 2018
Drift Theory in Continuous Search Spaces: Expected Hitting Time of the (1+1)-ES with 1/5 Success RuleYouhei Akimoto, Anne Auger, Tobias Glasmachers
This paper explores the use of the standard approach for proving runtime bounds in discrete domains---often referred to as drift analysis---in the context of optimization on a continuous domain. Using this framework we analyze the (1+1) Evolution Strategy with one-fifth success rule on the sphere function. To deal with potential functions that are not lower-bounded, we formulate novel drift theorems. We then use the theorems to prove bounds on the expected hitting time to reach a certain target fitness in finite dimension $d$. The bounds are akin to linear convergence. We then study the dependency of the different terms on $d$ proving a convergence rate dependency of $Θ(1/d)$. Our results constitute the first non-asymptotic analysis for the algorithm considered as well as the first explicit application of drift analysis to a randomized search heuristic with continuous domain.
NEJun 9, 2017
Global Convergence of the (1+1) Evolution StrategyTobias Glasmachers
We establish global convergence of the (1+1) evolution strategy, i.e., convergence to a critical point independent of the initial state. More precisely, we show the existence of a critical limit point, using a suitable extension of the notion of a critical point to measurable functions. At its core, the analysis is based on a novel progress guarantee for elitist, rank-based evolutionary algorithms. By applying it to the (1+1) evolution strategy we are able to provide an accurate characterization of whether global convergence is guaranteed with full probability, or whether premature convergence is possible. We illustrate our results on a number of example applications ranging from smooth (non-convex) cases over different types of saddle points and ridge functions to discontinuous and extremely rugged problems.
NEMay 18, 2017
Limited-Memory Matrix Adaptation for Large Scale Black-box OptimizationIlya Loshchilov, Tobias Glasmachers, Hans-Georg Beyer
The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is a popular method to deal with nonconvex and/or stochastic optimization problems when the gradient information is not available. Being based on the CMA-ES, the recently proposed Matrix Adaptation Evolution Strategy (MA-ES) provides a rather surprising result that the covariance matrix and all associated operations (e.g., potentially unstable eigendecomposition) can be replaced in the CMA-ES by a updated transformation matrix without any loss of performance. In order to further simplify MA-ES and reduce its $\mathcal{O}\big(n^2\big)$ time and storage complexity to $\mathcal{O}\big(n\log(n)\big)$, we present the Limited-Memory Matrix Adaptation Evolution Strategy (LM-MA-ES) for efficient zeroth order large-scale optimization. The algorithm demonstrates state-of-the-art performance on a set of established large-scale benchmarks. We explore the algorithm on the problem of generating adversarial inputs for a (non-smooth) random forest classifier, demonstrating a surprising vulnerability of the classifier.
LGApr 26, 2017
Limits of End-to-End LearningTobias Glasmachers
End-to-end learning refers to training a possibly complex learning system by applying gradient-based learning to the system as a whole. End-to-end learning system is specifically designed so that all modules are differentiable. In effect, not only a central learning machine, but also all "peripheral" modules like representation learning and memory formation are covered by a holistic learning process. The power of end-to-end learning has been demonstrated on many tasks, like playing a whole array of Atari video games with a single architecture. While pushing for solutions to more challenging tasks, network architectures keep growing more and more complex. In this paper we ask the question whether and to what extent end-to-end learning is a future-proof technique in the sense of scaling to complex and diverse data processing architectures. We point out potential inefficiencies, and we argue in particular that end-to-end learning does not make optimal use of the modular design of present neural networks. Our surprisingly simple experiments demonstrate these inefficiencies, up to the complete breakdown of learning.
NEMay 9, 2016
Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)Ilya Loshchilov, Tobias Glasmachers
We propose a multi-objective optimization algorithm aimed at achieving good anytime performance over a wide range of problems. Performance is assessed in terms of the hypervolume metric. The algorithm called HMO-CMA-ES represents a hybrid of several old and new variants of CMA-ES, complemented by BOBYQA as a warm start. We benchmark HMO-CMA-ES on the recently introduced bi-objective problem suite of the COCO framework (COmparing Continuous Optimizers), consisting of 55 scalable continuous optimization problems, which is used by the Black-Box Optimization Benchmarking (BBOB) Workshop 2016.
MLFeb 10, 2016
Fast model selection by limiting SVM training timesAydin Demircioglu, Daniel Horn, Tobias Glasmachers et al.
Kernelized Support Vector Machines (SVMs) are among the best performing supervised learning methods. But for optimal predictive performance, time-consuming parameter tuning is crucial, which impedes application. To tackle this problem, the classic model selection procedure based on grid-search and cross-validation was refined, e.g. by data subsampling and direct search heuristics. Here we focus on a different aspect, the stopping criterion for SVM training. We show that by limiting the training time given to the SVM solver during parameter tuning we can reduce model selection times by an order of magnitude.
MLJan 15, 2014
Coordinate Descent with Online Adaptation of Coordinate FrequenciesTobias Glasmachers, Ürün Dogan
Coordinate descent (CD) algorithms have become the method of choice for solving a number of optimization problems in machine learning. They are particularly popular for training linear models, including linear support vector machine classification, LASSO regression, and logistic regression. We consider general CD with non-uniform selection of coordinates. Instead of fixing selection frequencies beforehand we propose an online adaptation mechanism for this important parameter, called the adaptive coordinate frequencies (ACF) method. This mechanism removes the need to estimate optimal coordinate frequencies beforehand, and it automatically reacts to changing requirements during an optimization run. We demonstrate the usefulness of our ACF-CD approach for a variety of optimization problems arising in machine learning contexts. Our algorithm offers significant speed-ups over state-of-the-art training methods.
LGJul 31, 2013
The Planning-ahead SMO AlgorithmTobias Glasmachers
The sequential minimal optimization (SMO) algorithm and variants thereof are the de facto standard method for solving large quadratic programs for support vector machine (SVM) training. In this paper we propose a simple yet powerful modification. The main emphasis is on an algorithm improving the SMO step size by planning-ahead. The theoretical analysis ensures its convergence to the optimum. Experiments involving a large number of datasets were carried out to demonstrate the superiority of the new algorithm.
LGMay 2, 2013
Testing Hypotheses by Regularized Maximum Mean DiscrepancySomayeh Danafar, Paola M. V. Rancoita, Tobias Glasmachers et al.
Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare distributions by the distance between their embeddings. We show that Regularized Maximum Mean Discrepancy (RMMD), our novel measure for kernel-based hypothesis testing, yields substantial improvements even when sample sizes are small, and excels at hypothesis tests involving multiple comparisons with power control. We derive asymptotic distributions under the null and alternative hypotheses, and assess power control. Outstanding results are obtained on: challenging EEG data, MNIST, the Berkley Covertype, and the Flare-Solar dataset.
MLFeb 22, 2013
Accelerated Linear SVM Training with Adaptive Variable Selection FrequenciesTobias Glasmachers, Ürün Dogan
Support vector machine (SVM) training is an active research area since the dawn of the method. In recent years there has been increasing interest in specialized solvers for the important case of linear models. The algorithm presented by Hsieh et al., probably best known under the name of the "liblinear" implementation, marks a major breakthrough. The method is analog to established dual decomposition algorithms for training of non-linear SVMs, but with greatly reduced computational complexity per update step. This comes at the cost of not keeping track of the gradient of the objective any more, which excludes the application of highly developed working set selection algorithms. We present an algorithmic improvement to this method. We replace uniform working set selection with an online adaptation of selection frequencies. The adaptation criterion is inspired by modern second order working set selection methods. The same mechanism replaces the shrinking heuristic. This novel technique speeds up training in some cases by more than an order of magnitude.