Lars Schmidt-Thieme

h-index48

74papers

17,916citations

Novelty50%

AI Score58

Ranked #4,809 of 194,257 authors (top 2%)#1,320 in LG (top 3%)

74 Papers

1.5CVJun 5, 2023Code

DeepStay: Stay Region Extraction from Location Trajectories using Weak Supervision

Christian Löwens, Daniela Thyssens, Emma Andersson et al.

Nowadays, mobile devices enable constant tracking of the user's position and location trajectories can be used to infer personal points of interest (POIs) like homes, workplaces, or stores. A common way to extract POIs is to first identify spatio-temporal regions where a user spends a significant amount of time, known as stay regions (SRs). Common approaches to SR extraction are evaluated either solely unsupervised or on a small-scale private dataset, as popular public datasets are unlabeled. Most of these methods rely on hand-crafted features or thresholds and do not learn beyond hyperparameter optimization. Therefore, we propose a weakly and self-supervised transformer-based model called DeepStay, which is trained on location trajectories to predict stay regions. To the best of our knowledge, this is the first approach based on deep learning and the first approach that is evaluated on a public, labeled dataset. Our SR extraction method outperforms state-of-the-art methods. In addition, we conducted a limited experiment on the task of transportation mode detection from GPS trajectories using the same architecture and achieved significantly higher scores than the state-of-the-art. Our code is available at https://github.com/christianll9/deepstay.

19.5IRApr 4, 2022Code

CARCA: Context and Attribute-Aware Next-Item Recommendation via Cross-Attention

Ahmed Rashed, Shereen Elsayed, Lars Schmidt-Thieme

In sparse recommender settings, users' context and item attributes play a crucial role in deciding which items to recommend next. Despite that, recent works in sequential and time-aware recommendations usually either ignore both aspects or only consider one of them, limiting their predictive performance. In this paper, we address these limitations by proposing a context and attribute-aware recommender model (CARCA) that can capture the dynamic nature of the user profiles in terms of contextual features and item attributes via dedicated multi-head self-attention blocks that extract profile-level features and predicting item scores. Also, unlike many of the current state-of-the-art sequential item recommendation approaches that use a simple dot-product between the most recent item's latent features and the target items embeddings for scoring, CARCA uses cross-attention between all profile items and the target items to predict their final scores. This cross-attention allows CARCA to harness the correlation between old and recent items in the user profile and their influence on deciding which item to recommend next. Experiments on four real-world recommender system datasets show that the proposed model significantly outperforms all state-of-the-art models in the task of item recommendation and achieving improvements of up to 53% in Normalized Discounted Cumulative Gain (NDCG) and Hit-Ratio. Results also show that CARCA outperformed several state-of-the-art dedicated image-based recommender systems by merely utilizing image attributes extracted from a pre-trained ResNet50 in a black-box fashion.

15.6LGJun 16, 2022Code

Zero-Shot AutoML with Pretrained Models

Ekrem Öztürk, Fabio Ferreira, Hadi S. Jomaa et al.

Given a new dataset D and a low compute budget, how should we choose a pre-trained model to fine-tune to D, and set the fine-tuning hyperparameters without risking overfitting, particularly if D is small? Here, we extend automated machine learning (AutoML) to best make these choices. Our domain-independent meta-learning approach learns a zero-shot surrogate model which, at test time, allows to select the right deep learning (DL) pipeline (including the pre-trained model and fine-tuning hyperparameters) for a new dataset D given only trivial meta-features describing D such as image resolution or the number of classes. To train this zero-shot model, we collect performance data for many DL pipelines on a large collection of datasets and meta-train on this data to minimize a pairwise ranking objective. We evaluate our approach under the strict time limit of the vision track of the ChaLearn AutoDL challenge benchmark, clearly outperforming all challenge contenders.

6.9LGSep 2, 2022

When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development

Nghia Duong-Trung, Stefan Born, Jong Woo Kim et al.

Machine learning (ML) is becoming increasingly crucial in many fields of engineering but has not yet played out its full potential in bioprocess engineering. While experimentation has been accelerated by increasing levels of lab automation, experimental planning and data modeling are still largerly depend on human intervention. ML can be seen as a set of tools that contribute to the automation of the whole experimental cycle, including model building and practical planning, thus allowing human experts to focus on the more demanding and overarching cognitive tasks. First, probabilistic programming is used for the autonomous building of predictive models. Second, machine learning automatically assesses alternative decisions by planning experiments to test hypotheses and conducting investigations to gather informative data that focus on model selection based on the uncertainty of model predictions. This review provides a comprehensive overview of ML-based automation in bioprocess development. On the one hand, the biotech and bioengineering community should be aware of the potential and, most importantly, the limitation of existing ML solutions for their application in biotechnology and biopharma. On the other hand, it is essential to identify the missing links to enable the easy implementation of ML and Artificial Intelligence (AI) tools in valuable solutions for the bio-community.

16.1LGJun 27, 2022Code

Learning to Control Local Search for Combinatorial Optimization

Jonas K. Falkner, Daniela Thyssens, Ahmad Bdeir et al.

Combinatorial optimization problems are encountered in many practical contexts such as logistics and production, but exact solutions are particularly difficult to find and usually NP-hard for considerable problem sizes. To compute approximate solutions, a zoo of generic as well as problem-specific variants of local search is commonly used. However, which variant to apply to which particular problem is difficult to decide even for experts. In this paper we identify three independent algorithmic aspects of such local search algorithms and formalize their sequential selection over an optimization process as Markov Decision Process (MDP). We design a deep graph neural network as policy model for this MDP, yielding a learned controller for local search called NeuroLS. Ample experimental evidence shows that NeuroLS is able to outperform both, well-known general purpose local search controllers from Operations Research as well as latest machine learning-based approaches.

12.2AIMay 5, 2022Code

End-to-End Image-Based Fashion Recommendation

Shereen Elsayed, Lukas Brinkmeyer, Lars Schmidt-Thieme

In fashion-based recommendation settings, incorporating the item image features is considered a crucial factor, and it has shown significant improvements to many traditional models, including but not limited to matrix factorization, auto-encoders, and nearest neighbor models. While there are numerous image-based recommender approaches that utilize dedicated deep neural networks, comparisons to attribute-aware models are often disregarded despite their ability to be easily extended to leverage items' image features. In this paper, we propose a simple yet effective attribute-aware model that incorporates image features for better item representation learning in item recommendation tasks. The proposed model utilizes items' image features extracted by a calibrated ResNet50 component. We present an ablation study to compare incorporating the image features using three different techniques into the recommender system component that can seamlessly leverage any available items' attributes. Experiments on two image-based real-world recommender systems datasets show that the proposed model significantly outperforms all state-of-the-art image-based models.

9.6LGApr 7, 2022Code

Few-Shot Forecasting of Time-Series with Heterogeneous Channels

Lukas Brinkmeyer, Rafael Rego Drumond, Johannes Burchert et al.

Learning complex time series forecasting models usually requires a large amount of data, as each model is trained from scratch for each task/data set. Leveraging learning experience with similar datasets is a well-established technique for classification problems called few-shot classification. However, existing approaches cannot be applied to time-series forecasting because i) multivariate time-series datasets have different channels and ii) forecasting is principally different from classification. In this paper we formalize the problem of few-shot forecasting of time-series with heterogeneous channels for the first time. Extending recent work on heterogeneous attributes in vector data, we develop a model composed of permutation-invariant deep set-blocks which incorporate a temporal embedding. We assemble the first meta-dataset of 40 multivariate time-series datasets and show through experiments that our model provides a good generalization, outperforming baselines carried over from simpler scenarios that either fail to learn across tasks or miss temporal information.

10.4LGJul 14, 2022

Attention, Filling in The Gaps for Generalization in Routing Problems

Ahmad Bdeir, Jonas K. Falkner, Lars Schmidt-Thieme

Machine Learning (ML) methods have become a useful tool for tackling vehicle routing problems, either in combination with popular heuristics or as standalone models. However, current methods suffer from poor generalization when tackling problems of different sizes or different distributions. As a result, ML in vehicle routing has witnessed an expansion phase with new methodologies being created for particular problem instances that become infeasible at larger problem sizes. This paper aims at encouraging the consolidation of the field through understanding and improving current existing models, namely the attention model by Kool et al. We identify two discrepancy categories for VRP generalization. The first is based on the differences that are inherent to the problems themselves, and the second relates to architectural weaknesses that limit the model's ability to generalize. Our contribution becomes threefold: We first target model discrepancies by adapting the Kool et al. method and its loss function for Sparse Dynamic Attention based on the alpha-entmax activation. We then target inherent differences through the use of a mixed instance training method that has been shown to outperform single instance training in certain scenarios. Finally, we introduce a framework for inference level data augmentation that improves performance by leveraging the model's lack of invariance to rotation and dilation changes.

13.3LGJul 8

The Importance of Encoder Choice:A Tabular-Image Study

Ilia Koloiarov, Diego Coello de Portugal Mecke, Vijaya Krishna Yalavarthi et al.

Multimodal learning usually requires a dedicated encoder per modality. When a tabular modality is involved, prior work has been mostly using a \emph{plain MLP} as the encoder. Yet if it were a strong encoder, the tabular domain would not be ``the last unconquered castle for deep learning''. This study evaluates state-of-the-art tabular models as encoders in the image-tabular setting for the first time. An obstacle stands out. In-Context Learning models, among the best performing methods in the tabular domain, require labels to process instances, making it non-trivial to embed training and test instances the same way. We addressed this problem across multiple models of this family. With this study, we would like to highlight the importance of encoder factor in the multimodal learning.

4.6LGDec 22, 2022Code

Few-shot human motion prediction for heterogeneous sensors

Rafael Rego Drumond, Lukas Brinkmeyer, Lars Schmidt-Thieme

Human motion prediction is a complex task as it involves forecasting variables over time on a graph of connected sensors. This is especially true in the case of few-shot learning, where we strive to forecast motion sequences for previously unseen actions based on only a few examples. Despite this, almost all related approaches for few-shot motion prediction do not incorporate the underlying graph, while it is a common component in classical motion prediction. Furthermore, state-of-the-art methods for few-shot motion prediction are restricted to motion tasks with a fixed output space meaning these tasks are all limited to the same sensor graph. In this work, we propose to extend recent works on few-shot time-series forecasting with heterogeneous attributes with graph neural networks to introduce the first few-shot motion approach that explicitly incorporates the spatial graph while also generalizing across motion tasks with heterogeneous sensors. In our experiments on motion tasks with heterogeneous sensors, we demonstrate significant performance improvements with lifts from 10.4% up to 39.3% compared to best state-of-the-art models. Moreover, we show that our model can perform on par with the best approach so far when evaluating on tasks with a fixed output space while maintaining two magnitudes fewer parameters.

3.7CVOct 1, 2022Code

Offline Handwritten Amharic Character Recognition Using Few-shot Learning

Mesay Samuel, Lars Schmidt-Thieme, DP Sharma et al.

Few-shot learning is an important, but challenging problem of machine learning aimed at learning from only fewer labeled training examples. It has become an active area of research due to deep learning requiring huge amounts of labeled dataset, which is not feasible in the real world. Learning from a few examples is also an important attempt towards learning like humans. Few-shot learning has proven a very good promise in different areas of machine learning applications, particularly in image classification. As it is a recent technique, most researchers focus on understanding and solving the issues related to its concept by focusing only on common image datasets like Mini-ImageNet and Omniglot. Few-shot learning also opens an opportunity to address low resource languages like Amharic. In this study, offline handwritten Amharic character recognition using few-shot learning is addressed. Particularly, prototypical networks, the popular and simpler type of few-shot learning, is implemented as a baseline. Using the opportunities explored in the nature of Amharic alphabet having row-wise and column-wise similarities, a novel way of augmenting the training episodes is proposed. The experimental results show that the proposed method outperformed the baseline method. This study has implemented few-shot learning for Amharic characters for the first time. More importantly, the findings of the study open new ways of examining the influence of training episodes in few-shot learning, which is one of the important issues that needs exploration. The datasets used for this study are collected from native Amharic language writers using an Android App developed as a part of this study.

6.9LGAug 24, 2022Code

DCSF: Deep Convolutional Set Functions for Classification of Asynchronous Time Series

Vijaya Krishna Yalavarthi, Johannes Burchert, Lars Schmidt-Thieme

Asynchronous Time Series is a multivariate time series where all the channels are observed asynchronously-independently, making the time series extremely sparse when aligning them. We often observe this effect in applications with complex observation processes, such as health care, climate science, and astronomy, to name a few. Because of the asynchronous nature, they pose a significant challenge to deep learning architectures, which presume that the time series presented to them are regularly sampled, fully observed, and aligned with respect to time. This paper proposes a novel framework, that we call Deep Convolutional Set Functions (DCSF), which is highly scalable and memory efficient, for the asynchronous time series classification task. With the recent advancements in deep set learning architectures, we introduce a model that is invariant to the order in which time series' channels are presented to it. We explore convolutional neural networks, which are well researched for the closely related problem-classification of regularly sampled and fully observed time series, for encoding the set elements. We evaluate DCSF for AsTS classification, and online (per time point) AsTS classification. Our extensive experiments on multiple real-world and synthetic datasets verify that the suggested model performs substantially better than a range of state-of-the-art models in terms of accuracy and run time.

4.6LGMay 2, 2022Code

Large Neighborhood Search based on Neural Construction Heuristics

Jonas K. Falkner, Daniela Thyssens, Lars Schmidt-Thieme

We propose a Large Neighborhood Search (LNS) approach utilizing a learned construction heuristic based on neural networks as repair operator to solve the vehicle routing problem with time windows (VRPTW). Our method uses graph neural networks to encode the problem and auto-regressively decodes a solution and is trained with reinforcement learning on the construction task without requiring any labels for supervision. The neural repair operator is combined with a local search routine, heuristic destruction operators and a selection procedure applied to a small population to arrive at a sophisticated solution approach. The key idea is to use the learned model to re-construct the partially destructed solution and to introduce randomness via the destruction heuristics (or the stochastic policy itself) to effectively explore a large neighborhood.

2.0IROct 18, 2022Code

Deep Multi-Representation Model for Click-Through Rate Prediction

Shereen Elsayed, Lars Schmidt-Thieme

Click-Through Rate prediction (CTR) is a crucial task in recommender systems, and it gained considerable attention in the past few years. The primary purpose of recent research emphasizes obtaining meaningful and powerful representations through mining low and high feature interactions using various components such as Deep Neural Networks (DNN), CrossNets, or transformer blocks. In this work, we propose the Deep Multi-Representation model (DeepMR) that jointly trains a mixture of two powerful feature representation learning components, namely DNNs and multi-head self-attentions. Furthermore, DeepMR integrates the novel residual with zero initialization (ReZero) connections to the DNN and the multi-head self-attention components for learning superior input representations. Experiments on three real-world datasets show that the proposed model significantly outperforms all state-of-the-art models in the task of click-through rate prediction.

3.3LGJul 4, 2022Code

Solving the Traveling Salesperson Problem with Precedence Constraints by Deep Reinforcement Learning

Christian Löwens, Inaam Ashraf, Alexander Gembus et al.

This work presents solutions to the Traveling Salesperson Problem with precedence constraints (TSPPC) using Deep Reinforcement Learning (DRL) by adapting recent approaches that work well for regular TSPs. Common to these approaches is the use of graph models based on multi-head attention (MHA) layers. One idea for solving the pickup and delivery problem (PDP) is using heterogeneous attentions to embed the different possible roles each node can take. In this work, we generalize this concept of heterogeneous attentions to the TSPPC. Furthermore, we adapt recent ideas to sparsify attentions for better scalability. Overall, we contribute to the research community through the application and evaluation of recent DRL methods in solving the TSPPC.

2.0LGJul 19, 2023Code

Forecasting Early with Meta Learning

Shayan Jawed, Kiran Madhusudhanan, Vijaya Krishna Yalavarthi et al.

In the early observation period of a time series, there might be only a few historic observations available to learn a model. However, in cases where an existing prior set of datasets is available, Meta learning methods can be applicable. In this paper, we devise a Meta learning method that exploits samples from additional datasets and learns to augment time series through adversarial learning as an auxiliary task for the target dataset. Our model (FEML), is equipped with a shared Convolutional backbone that learns features for varying length inputs from different datasets and has dataset specific heads to forecast for different output lengths. We show that FEML can meta learn across datasets and by additionally learning on adversarial generated samples as auxiliary samples for the target dataset, it can improve the forecasting performance compared to single task learning, and various solutions adapted from Joint learning, Multi-task learning and classic forecasting baselines.

1.5CVApr 14, 2023

Directly Optimizing IoU for Bounding Box Localization

Mofassir ul Islam Arif, Mohsan Jameel, Lars Schmidt-Thieme

Object detection has seen remarkable progress in recent years with the introduction of Convolutional Neural Networks (CNN). Object detection is a multi-task learning problem where both the position of the objects in the images as well as their classes needs to be correctly identified. The idea here is to maximize the overlap between the ground-truth bounding boxes and the predictions i.e. the Intersection over Union (IoU). In the scope of work seen currently in this domain, IoU is approximated by using the Huber loss as a proxy but this indirect method does not leverage the IoU information and treats the bounding box as four independent, unrelated terms of regression. This is not true for a bounding box where the four coordinates are highly correlated and hold a semantic meaning when taken together. The direct optimization of the IoU is not possible due to its non-convex and non-differentiable nature. In this paper, we have formulated a novel loss namely, the Smooth IoU, which directly optimizes the IoUs for the bounding boxes. This loss has been evaluated on the Oxford IIIT Pets, Udacity self-driving car, PASCAL VOC, and VWFS Car Damage datasets and has shown performance gains over the standard Huber loss.

5.3LGFeb 10, 2023Code

Neural Capacitated Clustering

Jonas K. Falkner, Lars Schmidt-Thieme

Recent work on deep clustering has found new promising methods also for constrained clustering problems. Their typically pairwise constraints often can be used to guide the partitioning of the data. Many problems however, feature cluster-level constraints, e.g. the Capacitated Clustering Problem (CCP), where each point has a weight and the total weight sum of all points in each cluster is bounded by a prescribed capacity. In this paper we propose a new method for the CCP, Neural Capacited Clustering, that learns a neural network to predict the assignment probabilities of points to cluster centers from a data set of optimal or near optimal past solutions of other problem instances. During inference, the resulting scores are then used in an iterative k-means like procedure to refine the assignment under capacity constraints. In our experiments on artificial data and two real world datasets our approach outperforms several state-of-the-art mathematical and heuristic solvers from the literature. Moreover, we apply our method in the context of a cluster-first-route-second approach to the Capacitated Vehicle Routing Problem (CVRP) and show competitive results on the well-known Uchoa benchmark.

1.5CVApr 14, 2023

Phantom Embeddings: Using Embedding Space for Model Regularization in Deep Neural Networks

Mofassir ul Islam Arif, Mohsan Jameel, Josif Grabocka et al.

The strength of machine learning models stems from their ability to learn complex function approximations from data; however, this strength also makes training deep neural networks challenging. Notably, the complex models tend to memorize the training data, which results in poor regularization performance on test data. The regularization techniques such as L1, L2, dropout, etc. are proposed to reduce the overfitting effect; however, they bring in additional hyperparameters tuning complexity. These methods also fall short when the inter-class similarity is high due to the underlying data distribution, leading to a less accurate model. In this paper, we present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation. We create phantom embeddings from a subset of homogenous samples and use these phantom embeddings to decrease the inter-class similarity of instances in their latent embedding space. The resulting models generalize better as a combination of their embedding and regularize them without requiring an expensive hyperparameter search. We evaluate our method on two popular and challenging image classification datasets (CIFAR and FashionMNIST) and show how our approach outperforms the standard baselines while displaying better training behavior.

1.8LGDec 5, 2022Code

Auxiliary Quantile Forecasting with Linear Networks

Shayan Jawed, Lars Schmidt-Thieme

We propose a novel multi-task method for quantile forecasting with shared Linear layers. Our method is based on the Implicit quantile learning approach, where samples from the Uniform distribution $\mathcal{U}(0, 1)$ are reparameterized to quantile values of the target distribution. We combine the implicit quantile and input time series representations to directly forecast multiple quantile estimations for multiple horizons jointly. Prior works have adopted a Linear layer for the direct estimation of all forecasting horizons in a multi-task learning setup. We show that following similar intuition from multi-task learning to exploit correlations among forecast horizons, we can model multiple quantile estimates as auxiliary tasks for each of the forecast horizon to improve forecast accuracy across the quantile estimates compared to modeling only a single quantile estimate. We show learning auxiliary quantile tasks leads to state-of-the-art performance on deterministic forecasting benchmarks concerning the main-task of forecasting the 50$^{th}$ percentile estimate.

9.8LGSep 29, 2023Code

Too Big, so Fail? -- Enabling Neural Construction Methods to Solve Large-Scale Routing Problems

Jonas K. Falkner, Lars Schmidt-Thieme

In recent years new deep learning approaches to solve combinatorial optimization problems, in particular NP-hard Vehicle Routing Problems (VRP), have been proposed. The most impactful of these methods are sequential neural construction approaches which are usually trained via reinforcement learning. Due to the high training costs of these models, they usually are trained on limited instance sizes (e.g. serving 100 customers) and later applied to vastly larger instance size (e.g. 2000 customers). By means of a systematic scale-up study we show that even state-of-the-art neural construction methods are outperformed by simple heuristics, failing to generalize to larger problem instances. We propose to use the ruin recreate principle that alternates between completely destroying a localized part of the solution and then recreating an improved variant. In this way, neural construction methods like POMO are never applied to the global problem but just in the reconstruction step, which only involves partial problems much closer in size to their original training instances. In thorough experiments on four datasets of varying distributions and modalities we show that our neural ruin recreate approach outperforms alternative forms of improving construction methods such as sampling and beam search and in several experiments also advanced local search approaches.

10.7LGOct 6, 2023

Routing Arena: A Benchmark Suite for Neural Routing Solvers

Daniela Thyssens, Tim Dernedde, Jonas K. Falkner et al.

Neural Combinatorial Optimization has been researched actively in the last eight years. Even though many of the proposed Machine Learning based approaches are compared on the same datasets, the evaluation protocol exhibits essential flaws and the selection of baselines often neglects State-of-the-Art Operations Research approaches. To improve on both of these shortcomings, we propose the Routing Arena, a benchmark suite for Routing Problems that provides a seamless integration of consistent evaluation and the provision of baselines and benchmarks prevalent in the Machine Learning- and Operations Research field. The proposed evaluation protocol considers the two most important evaluation cases for different applications: First, the solution quality for an a priori fixed time budget and secondly the anytime performance of the respective methods. By setting the solution trajectory in perspective to a Best Known Solution and a Base Solver's solutions trajectory, we furthermore propose the Weighted Relative Average Performance (WRAP), a novel evaluation metric that quantifies the often claimed runtime efficiency of Neural Routing Solvers. A comprehensive first experimental evaluation demonstrates that the most recent Operations Research solvers generate state-of-the-art results in terms of solution quality and runtime efficiency when it comes to the vehicle routing problem. Nevertheless, some findings highlight the advantages of neural approaches and motivate a shift in how neural solvers should be conceptualized.

8.7LGOct 5, 2022Code

Tripletformer for Probabilistic Interpolation of Irregularly sampled Time Series

Vijaya Krishna Yalavarthi, Johannes Burchert, Lars Schmidt-thieme

Irregularly sampled time series data with missing values is observed in many fields like healthcare, astronomy, and climate science. Interpolation of these types of time series is crucial for tasks such as root cause analysis and medical diagnosis, as well as for smoothing out irregular or noisy data. To address this challenge, we present a novel encoder-decoder architecture called "Tripletformer" for probabilistic interpolation of irregularly sampled time series with missing values. This attention-based model operates on sets of observations, where each element is composed of a triple of time, channel, and value. The encoder and decoder of the Tripletformer are designed with attention layers and fully connected layers, enabling the model to effectively process the presented set elements. We evaluate the Tripletformer against a range of baselines on multiple real-world and synthetic datasets and show that it produces more accurate and certain interpolations. Results indicate an improvement in negative loglikelihood error by up to 32% on real-world datasets and 85% on synthetic datasets when using the Tripletformer compared to the next best model.

17.0LGAug 1, 2024Code

A Cross-Domain Benchmark for Active Learning

Thorben Werner, Johannes Burchert, Maximilian Stubbemann et al.

Active Learning (AL) deals with identifying the most informative samples for labeling to reduce data annotation costs for supervised learning tasks. AL research suffers from the fact that lifts from literature generalize poorly and that only a small number of repetitions of experiments are conducted. To overcome these obstacles, we propose CDALBench, the first active learning benchmark which includes tasks in computer vision, natural language processing and tabular learning. Furthermore, by providing an efficient, greedy oracle, CDALBench can be evaluated with 50 runs for each experiment. We show, that both the cross-domain character and a large amount of repetitions are crucial for sophisticated evaluation of AL research. Concretely, we show that the superiority of specific methods varies over the different domains, making it important to evaluate Active Learning with a cross-domain benchmark. Additionally, we show that having a large amount of runs is crucial. With only conducting three runs as often done in the literature, the superiority of specific methods can strongly vary with the specific runs. This effect is so strong, that, depending on the seed, even a well-established method's performance can be significantly better and significantly worse than random for the same dataset.

4.0IRApr 29, 2024Code

HMAR: Hierarchical Masked Attention for Multi-Behaviour Recommendation

Shereen Elsayed, Ahmed Rashed, Lars Schmidt-Thieme

In the context of recommendation systems, addressing multi-behavioral user interactions has become vital for understanding the evolving user behavior. Recent models utilize techniques like graph neural networks and attention mechanisms for modeling diverse behaviors, but capturing sequential patterns in historical interactions remains challenging. To tackle this, we introduce Hierarchical Masked Attention for multi-behavior recommendation (HMAR). Specifically, our approach applies masked self-attention to items of the same behavior, followed by self-attention across all behaviors. Additionally, we propose historical behavior indicators to encode the historical frequency of each items behavior in the input sequence. Furthermore, the HMAR model operates in a multi-task setting, allowing it to learn item behaviors and their associated ranking scores concurrently. Extensive experimental results on four real-world datasets demonstrate that our proposed model outperforms state-of-the-art methods. Our code and datasets are available here (https://github.com/Shereen-Elsayed/HMAR).

7.1LGMar 11

LAtte: Hyperbolic Lorentz Attention for Cross-Subject EEG Classification

Johannes Burchert, Ahmad Bdeir, Tom Hanika et al.

Electroencephalogram (EEG) classification is critical for applications ranging from medical diagnostics to brain-computer interfaces, yet it remains challenging due to the inherently low signal-to-noise ratio (SNR) and high inter-subject variability. To address these issues, we propose LAtte, a novel framework that integrates a Lorentz Attention Module with an InceptionTime-based encoder to enable robust and generalizable EEG classification. Unlike prior work, which evaluates primarily on single-subject performance, LAtte focuses on cross-subject training. First, we learn a shared baseline signal across all subjects using pretraining tasks to capture common underlying patterns. Then, we utilize novel Lorentz low-rank adapters to learn subject-specific embeddings that model individual differences. This allows us to learn a shared model that performs robustly across subjects, and can be subsequently finetuned for individual subjects or used to generalize to unseen subjects. We evaluate LAtte on three well-established EEG datasets, achieving a substantial improvement in performance over current state-of-the-art methods.

7.1LGMar 28, 2025Code

STADE: Standard Deviation as a Pruning Metric

Diego Coello de Portugal Mecke, Haya Alyoussef, Maximilian Stubbemann et al.

Recently, Large Language Models (LLMs) have become very widespread and are used to solve a wide variety of tasks. To successfully handle these tasks, LLMs require longer training times and larger model sizes. This makes LLMs ideal candidates for pruning methods that reduce computational demands while maintaining performance. Previous methods require a retraining phase after pruning to maintain the original model's performance. However, state-of-the-art pruning methods, such as Wanda, prune the model without retraining, making the pruning process faster and more efficient. Building upon Wanda's work, this study provides a theoretical explanation of why the method is effective and leverages these insights to enhance the pruning process. Specifically, a theoretical analysis of the pruning problem reveals a common scenario in Machine Learning where Wanda is the optimal pruning method. Furthermore, this analysis is extended to cases where Wanda is no longer optimal, leading to the development of a new method, STADE, based on the standard deviation of the input. From a theoretical standpoint, STADE demonstrates better generality across different scenarios. Finally, extensive experiments on Llama and Open Pre-trained Transformers (OPT) models validate these theoretical findings, showing that depending on the training conditions, Wanda's optimal performance varies as predicted by the theoretical framework. These insights contribute to a more robust understanding of pruning strategies and their practical implications. Code is available at: https://github.com/Coello-dev/STADE/

2.0LGNov 30, 2023

Towards Comparable Active Learning

Thorben Werner, Johannes Burchert, Lars Schmidt-Thieme

Active Learning has received significant attention in the field of machine learning for its potential in selecting the most informative samples for labeling, thereby reducing data annotation costs. However, we show that the reported lifts in recent literature generalize poorly to other domains leading to an inconclusive landscape in Active Learning research. Furthermore, we highlight overlooked problems for reproducing AL experiments that can lead to unfair comparisons and increased variance in the results. This paper addresses these issues by providing an Active Learning framework for a fair comparison of algorithms across different tasks and domains, as well as a fast and performant oracle algorithm for evaluation. To the best of our knowledge, we propose the first AL benchmark that tests algorithms in 3 major domains: Tabular, Image, and Text. We report empirical results for 6 widely used algorithms on 7 real-world and 2 synthetic datasets and aggregate them into a domain-specific ranking of AL algorithms.

11.5LGFeb 7, 2024Code

Moco: A Learnable Meta Optimizer for Combinatorial Optimization

Tim Dernedde, Daniela Thyssens, Sören Dittrich et al.

Relevant combinatorial optimization problems (COPs) are often NP-hard. While they have been tackled mainly via handcrafted heuristics in the past, advances in neural networks have motivated the development of general methods to learn heuristics from data. Many approaches utilize a neural network to directly construct a solution, but are limited in further improving based on already constructed solutions at inference time. Our approach, Moco, defines a lightweight solution construction procedure, guided by a single continuous vector $θ$ (called heatmap) and learns a neural network to update $θ$ for a single instance of a COP at inference time. The update is based on various features of the current search state. The training procedure is budget aware, targeting the overall best solution found during the entire search. Moco is a fully learnable meta optimizer not utilizing problem specific heuristics or requiring optimal solutions for training. We test Moco on the Traveling Salesman Problem (TSP) and Maximum Independent Set (MIS) and show that it significantly improves over other heatmap based methods.

14.4LGFeb 11, 2025

Physiome-ODE: A Benchmark for Irregularly Sampled Multivariate Time Series Forecasting Based on Biological ODEs

Christian Klötergens, Vijaya Krishna Yalavarthi, Randolf Scholz et al.

State-of-the-art methods for forecasting irregularly sampled time series with missing values predominantly rely on just four datasets and a few small toy examples for evaluation. While ordinary differential equations (ODE) are the prevalent models in science and engineering, a baseline model that forecasts a constant value outperforms ODE-based models from the last five years on three of these existing datasets. This unintuitive finding hampers further research on ODE-based models, a more plausible model family. In this paper, we develop a methodology to generate irregularly sampled multivariate time series (IMTS) datasets from ordinary differential equations and to select challenging instances via rejection sampling. Using this methodology, we create Physiome-ODE, a large and sophisticated benchmark of IMTS datasets consisting of 50 individual datasets, derived from real-world ordinary differential equations from research in biology. Physiome-ODE is the first benchmark for IMTS forecasting that we are aware of and an order of magnitude larger than the current evaluation setting of four datasets. Using our benchmark Physiome-ODE, we show qualitatively completely different results than those derived from the current four datasets: on Physiome-ODE ODE-based models can play to their strength and our benchmark can differentiate in a meaningful way between different IMTS forecasting models. This way, we expect to give a new impulse to research on ODE-based time series modeling.

4.6LGMar 6, 2024

ProbSAINT: Probabilistic Tabular Regression for Used Car Pricing

Kiran Madhusudhanan, Gunnar Behrens, Maximilian Stubbemann et al.

Used car pricing is a critical aspect of the automotive industry, influenced by many economic factors and market dynamics. With the recent surge in online marketplaces and increased demand for used cars, accurate pricing would benefit both buyers and sellers by ensuring fair transactions. However, the transition towards automated pricing algorithms using machine learning necessitates the comprehension of model uncertainties, specifically the ability to flag predictions that the model is unsure about. Although recent literature proposes the use of boosting algorithms or nearest neighbor-based approaches for swift and precise price predictions, encapsulating model uncertainties with such algorithms presents a complex challenge. We introduce ProbSAINT, a model that offers a principled approach for uncertainty quantification of its price predictions, along with accurate point predictions that are comparable to state-of-the-art boosting techniques. Furthermore, acknowledging that the business prefers pricing used cars based on the number of days the vehicle was listed for sale, we show how ProbSAINT can be used as a dynamic forecasting model for predicting price probabilities for different expected offer duration. Our experiments further indicate that ProbSAINT is especially accurate on instances where it is highly certain. This proves the applicability of its probabilistic predictions in real-world scenarios where trustworthiness is crucial.

7.1LGFeb 17, 2025

Mixing It Up: Exploring Mixer Networks for Irregular Multivariate Time Series Forecasting

Christian Klötergens, Vijaya Krishna Yalavarthi, Tim Dernedde et al.

Forecasting Irregular Multivariate Time Series (IMTS) has recently emerged as a distinct research field, necessitating specialized models to address its unique challenges. While most forecasting literature assumes regularly spaced observations without missing values, many real-world datasets - particularly in healthcare, climate research, and biomechanics - violate these assumptions. Time Series (TS)-mixer models have achieved remarkable success in regular multivariate time series forecasting. However, they remain unexplored for IMTS due to their requirement for complete and evenly spaced observations. To bridge this gap, we introduce IMTS-Mixer, a novel forecasting architecture designed specifically for IMTS. Our approach retains the core principles of TS mixer models while introducing innovative methods to transform IMTS into fixed-size matrix representations, enabling their seamless integration with mixer modules. We evaluate IMTS-Mixer on a benchmark of four real-world datasets from various domains. Our results demonstrate that IMTS-Mixer establishes a new state-of-the-art in forecasting accuracy while also improving computational efficiency.

6.4LGApr 10, 2024

Are EEG Sequences Time Series? EEG Classification with Time Series Models and Joint Subject Training

Johannes Burchert, Thorben Werner, Vijaya Krishna Yalavarthi et al.

As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for each individual subject are learned, not one model for all of them. In this paper, we systematically study the differences between EEG classification models and generic time series classification models. We describe three different model setups to deal with EEG data from different subjects, subject-specific models (most EEG literature), subject-agnostic models and subject-conditional models. In experiments on three datasets, we demonstrate that off-the-shelf time series classification models trained per subject perform close to EEG classification models, but that do not quite reach the performance of domain-specific modeling. Additionally, we combine time-series models with subject embeddings to train one joint subject-conditional classifier on all subjects. The resulting models are competitive with dedicated EEG models in 2 out of 3 datasets, even outperforming all EEG methods on one of them.

4.6LGMar 7, 2024Code

Hyperparameter Tuning MLPs for Probabilistic Time Series Forecasting

Kiran Madhusudhanan, Shayan Jawed, Lars Schmidt-Thieme

Time series forecasting attempts to predict future events by analyzing past trends and patterns. Although well researched, certain critical aspects pertaining to the use of deep learning in time series forecasting remain ambiguous. Our research primarily focuses on examining the impact of specific hyperparameters related to time series, such as context length and validation strategy, on the performance of the state-of-the-art MLP model in time series forecasting. We have conducted a comprehensive series of experiments involving 4800 configurations per dataset across 20 time series forecasting datasets, and our findings demonstrate the importance of tuning these parameters. Furthermore, in this work, we introduce the largest metadataset for timeseries forecasting to date, named TSBench, comprising 97200 evaluations, which is a twentyfold increase compared to previous works in the field. Finally, we demonstrate the utility of the created metadataset on multi-fidelity hyperparameter optimization tasks.

6.4LGFeb 9, 2024Code

Probabilistic Forecasting of Irregular Time Series via Conditional Flows

Vijaya Krishna Yalavarthi, Randolf Scholz, Stefan Born et al.

Probabilistic forecasting of irregularly sampled multivariate time series with missing values is an important problem in many fields, including health care, astronomy, and climate. State-of-the-art methods for the task estimate only marginal distributions of observations in single channels and at single timepoints, assuming a fixed-shape parametric distribution. In this work, we propose a novel model, ProFITi, for probabilistic forecasting of irregularly sampled time series with missing values using conditional normalizing flows. The model learns joint distributions over the future values of the time series conditioned on past observations and queried channels and times, without assuming any fixed shape of the underlying distribution. As model components, we introduce a novel invertible triangular attention layer and an invertible non-linear activation function on and onto the whole real line. We conduct extensive experiments on four datasets and demonstrate that the proposed model provides $4$ times higher likelihood over the previously best model.

4.1LGSep 5, 2025

Recurrent State Encoders for Efficient Neural Combinatorial Optimization

Tim Dernedde, Daniela Thyssens, Lars Schmidt-Thieme

The primary paradigm in Neural Combinatorial Optimization (NCO) are construction methods, where a neural network is trained to sequentially add one solution component at a time until a complete solution is constructed. We observe that the typical changes to the state between two steps are small, since usually only the node that gets added to the solution is removed from the state. An efficient model should be able to reuse computation done in prior steps. To that end, we propose to train a recurrent encoder that computes the state embeddings not only based on the state but also the embeddings of the step before. We show that the recurrent encoder can achieve equivalent or better performance than a non-recurrent encoder even if it consists of $3\times$ fewer layers, thus significantly improving on latency. We demonstrate our findings on three different problems: the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP), and the Orienteering Problem (OP) and integrate the models into a large neighborhood search algorithm, to showcase the practical relevance of our findings.

4.1LGAug 4, 2025

On Distributional Dependent Performance of Classical and Neural Routing Solvers

Daniela Thyssens, Tim Dernedde, Wilson Sentanoe et al.

Neural Combinatorial Optimization aims to learn to solve a class of combinatorial problems through data-driven methods and notably through employing neural networks by learning the underlying distribution of problem instances. While, so far neural methods struggle to outperform highly engineered problem specific meta-heuristics, this work explores a novel approach to formulate the distribution of problem instances to learn from and, more importantly, plant a structure in the sampled problem instances. In application to routing problems, we generate large problem instances that represent custom base problem instance distributions from which training instances are sampled. The test instances to evaluate the methods on the routing task consist of unseen problems sampled from the underlying large problem instance. We evaluate representative NCO methods and specialized Operation Research meta heuristics on this novel task and demonstrate that the performance gap between neural routing solvers and highly specialized meta-heuristics decreases when learning from sub-samples drawn from a fixed base node distribution.

4.1LGAug 1, 2025

The Role of Active Learning in Modern Machine Learning

Thorben Werner, Lars Schmidt-Thieme, Vijaya Krishna Yalavarthi

Even though Active Learning (AL) is widely studied, it is rarely applied in contexts outside its own scientific literature. We posit that the reason for this is AL's high computational cost coupled with the comparatively small lifts it is typically able to generate in scenarios with few labeled points. In this work we study the impact of different methods to combat this low data scenario, namely data augmentation (DA), semi-supervised learning (SSL) and AL. We find that AL is by far the least efficient method of solving the low data problem, generating a lift of only 1-4\% over random sampling, while DA and SSL methods can generate up to 60\% lift in combination with random sampling. However, when AL is combined with strong DA and SSL techniques, it surprisingly is still able to provide improvements. Based on these results, we frame AL not as a method to combat missing labels, but as the final building block to squeeze the last bits of performance out of data after appropriate DA and SSL methods as been applied.

4.1LGJan 2, 2025Code

Bayesian Active Learning By Distribution Disagreement

Thorben Werner, Lars Schmidt-Thieme

Active Learning (AL) for regression has been systematically under-researched due to the increased difficulty of measuring uncertainty in regression models. Since normalizing flows offer a full predictive distribution instead of a point forecast, they facilitate direct usage of known heuristics for AL like Entropy or Least-Confident sampling. However, we show that most of these heuristics do not work well for normalizing flows in pool-based AL and we need more sophisticated algorithms to distinguish between aleatoric and epistemic uncertainty. In this work we propose BALSA, an adaptation of the BALD algorithm, tailored for regression with normalizing flows. With this work we extend current research on uncertainty quantification with normalizing flows \cite{berry2023normalizing, berry2023escaping} to real world data and pool-based AL with multiple acquisition functions and query sizes. We report SOTA results for BALSA across 4 different datasets and 2 different architectures.

2.6LGJun 11, 2024

Marginalization Consistent Probabilistic Forecasting of Irregular Time Series via Mixture of Separable flows

Vijaya Krishna Yalavarthi, Randolf Scholz, Christian Kloetergens et al.

Probabilistic forecasting models for joint distributions of targets in irregular time series with missing values are a heavily under-researched area in machine learning, with, to the best of our knowledge, only two Models have been researched so far: The Gaussian Process Regression model, and ProFITi. While ProFITi, thanks to using multivariate normalizing flows, is very expressive, leading to better predictive performance, it suffers from marginalization inconsistency: It does not guarantee that the marginal distributions of a subset of variables in its predictive distributions coincide with the directly predicted distributions of these variables. When asked to directly predict marginal distributions, they are often vastly inaccurate. We propose MOSES (Marginalization Consistent Mixture of Separable Flows), a model that parametrizes a stochastic process through a mixture of several latent multivariate Gaussian Processes combined with separable univariate Normalizing Flows. In particular, MOSES can be analytically marginalized, allowing it to directly answer a wider range of probabilistic queries than most competitors. Experiments on four datasets show that MOSES achieves both accurate joint and marginal predictions, surpassing all other marginalization consistent baselines, while only trailing slightly behind ProFITi in joint prediction, but vastly superior when predicting marginal distributions.

4.6LGMay 6, 2024Code

Functional Latent Dynamics for Irregularly Sampled Time Series Forecasting

Christian Klötergens, Vijaya Krishna Yalavarthi, Maximilian Stubbemann et al.

Irregularly sampled time series with missing values are often observed in multiple real-world applications such as healthcare, climate and astronomy. They pose a significant challenge to standard deep learning models that operate only on fully observed and regularly sampled time series. In order to capture the continuous dynamics of the irregular time series, many models rely on solving an Ordinary Differential Equation (ODE) in the hidden state. These ODE-based models tend to perform slow and require large memory due to sequential operations and a complex ODE solver. As an alternative to complex ODE-based models, we propose a family of models called Functional Latent Dynamics (FLD). Instead of solving the ODE, we use simple curves which exist at all time points to specify the continuous latent state in the model. The coefficients of these curves are learned only from the observed values in the time series ignoring the missing values. Through extensive experiments, we demonstrate that FLD achieves better performance compared to the best ODE-based model while reducing the runtime and memory overhead. Specifically, FLD requires an order of magnitude less time to infer the forecasts compared to the best performing forecasting model.

3.8LGMay 22, 2023Code

Forecasting Irregularly Sampled Time Series using Graphs

Vijaya Krishna Yalavarthi, Kiran Madhusudhanan, Randolf Sholz et al.

Forecasting irregularly sampled time series with missing values is a crucial task for numerous real-world applications such as healthcare, astronomy, and climate sciences. State-of-the-art approaches to this problem rely on Ordinary Differential Equations (ODEs) which are known to be slow and often require additional features to handle missing values. To address this issue, we propose a novel model using Graphs for Forecasting Irregularly Sampled Time Series with missing values which we call GraFITi. GraFITi first converts the time series to a Sparsity Structure Graph which is a sparse bipartite graph, and then reformulates the forecasting problem as the edge weight prediction task in the graph. It uses the power of Graph Neural Networks to learn the graph and predict the target edge weights. GraFITi has been tested on 3 real-world and 1 synthetic irregularly sampled time series dataset with missing values and compared with various state-of-the-art models. The experimental results demonstrate that GraFITi improves the forecasting accuracy by up to 17% and reduces the run time up to 5 times compared to the state-of-the-art forecasting models.

2.6CVFeb 25, 2022

Improving Amharic Handwritten Word Recognition Using Auxiliary Task

Mesay Samuel Gondere, Lars Schmidt-Thieme, Durga Prasad Sharma et al.

Amharic is one of the official languages of the Federal Democratic Republic of Ethiopia. It is one of the languages that use an Ethiopic script which is derived from Gee'z, ancient and currently a liturgical language. Amharic is also one of the most widely used literature-rich languages of Ethiopia. There are very limited innovative and customized research works in Amharic optical character recognition (OCR) in general and Amharic handwritten text recognition in particular. In this study, Amharic handwritten word recognition will be investigated. State-of-the-art deep learning techniques including convolutional neural networks together with recurrent neural networks and connectionist temporal classification (CTC) loss were used to make the recognition in an end-to-end fashion. More importantly, an innovative way of complementing the loss function using the auxiliary task from the row-wise similarities of the Amharic alphabet was tested to show a significant recognition improvement over a baseline method. Such findings will promote innovative problem-specific solutions as well as will open insight to a generalized solution that emerges from problem-specific domains.

4.6LGFeb 11, 2022

Positive-Unlabeled Domain Adaptation

Jonas Sonntag, Gunnar Behrens, Lars Schmidt-Thieme

Domain Adaptation methodologies have shown to effectively generalize from a labeled source domain to a label scarce target domain. Previous research has either focused on unlabeled domain adaptation without any target supervision or semi-supervised domain adaptation with few labeled target examples per class. On the other hand Positive-Unlabeled (PU-) Learning has attracted increasing interest in the weakly supervised learning literature since in quite some real world applications positive labels are much easier to obtain than negative ones. In this work we are the first to introduce the challenge of Positive-Unlabeled Domain Adaptation where we aim to generalise from a fully labeled source domain to a target domain where only positive and unlabeled data is available. We present a novel two-step learning approach to this problem by firstly identifying reliable positive and negative pseudo-labels in the target domain guided by source domain labels and a positive-unlabeled risk estimator. This enables us to use a standard classifier on the target domain in a second step. We validate our approach by running experiments on benchmark datasets for visual object recognition. Furthermore we propose real world examples for our setting and validate our superior performance on parking occupancy data.

2.5AIFeb 9, 2022

A.I. and Data-Driven Mobility at Volkswagen Financial Services AG

Shayan Jawed, Mofassir ul Islam Arif, Ahmed Rashed et al.

Machine learning is being widely adapted in industrial applications owing to the capabilities of commercially available hardware and rapidly advancing research. Volkswagen Financial Services (VWFS), as a market leader in vehicle leasing services, aims to leverage existing proprietary data and the latest research to enhance existing and derive new business processes. The collaboration between Information Systems and Machine Learning Lab (ISMLL) and VWFS serves to realize this goal. In this paper, we propose methods in the fields of recommender systems, object detection, and forecasting that enable data-driven decisions for the vehicle life-cycle at VWFS.

4.6LGJan 5, 2022

Supervised Permutation Invariant Networks for Solving the CVRP with Bounded Fleet Size

Daniela Thyssens, Jonas Falkner, Lars Schmidt-Thieme

Learning to solve combinatorial optimization problems, such as the vehicle routing problem, offers great computational advantages over classical operations research solvers and heuristics. The recently developed deep reinforcement learning approaches either improve an initially given solution iteratively or sequentially construct a set of individual tours. However, most of the existing learning-based approaches are not able to work for a fixed number of vehicles and thus bypass the complex assignment problem of the customers onto an apriori given number of available vehicles. On the other hand, this makes them less suitable for real applications, as many logistic service providers rely on solutions provided for a specific bounded fleet size and cannot accommodate short term changes to the number of vehicles. In contrast we propose a powerful supervised deep learning framework that constructs a complete tour plan from scratch while respecting an apriori fixed number of available vehicles. In combination with an efficient post-processing scheme, our supervised approach is not only much faster and easier to train but also achieves competitive results that incorporate the practical aspect of vehicle costs. In thorough controlled experiments we compare our method to multiple state-of-the-art approaches where we demonstrate stable performance, while utilizing less vehicles and shed some light on existent inconsistencies in the experimentation protocols of the related work.

1.6LGOct 15, 2021

Improving Hyperparameter Optimization by Planning Ahead

Hadi S. Jomaa, Jonas Falkner, Lars Schmidt-Thieme

Hyperparameter optimization (HPO) is generally treated as a bi-level optimization problem that involves fitting a (probabilistic) surrogate model to a set of observed hyperparameter responses, e.g. validation loss, and consequently maximizing an acquisition function using a surrogate model to identify good hyperparameter candidates for evaluation. The choice of a surrogate and/or acquisition function can be further improved via knowledge transfer across related tasks. In this paper, we propose a novel transfer learning approach, defined within the context of model-based reinforcement learning, where we represent the surrogate as an ensemble of probabilistic models that allows trajectory sampling. We further propose a new variant of model predictive control which employs a simple look-ahead strategy as a policy that optimizes a sequence of actions, representing hyperparameter candidates to expedite HPO. Our experiments on three meta-datasets comparing to state-of-the-art HPO algorithms including a model-free reinforcement learning approach show that the proposed method can outperform all baselines by exploiting a simple planning-based policy.

11.9LGOct 13, 2021Code

Yformer: U-Net Inspired Transformer Architecture for Far Horizon Time Series Forecasting

Kiran Madhusudhanan, Johannes Burchert, Nghia Duong-Trung et al.

Time series data is ubiquitous in research as well as in a wide variety of industrial applications. Effectively analyzing the available historical data and providing insights into the far future allows us to make effective decisions. Recent research has witnessed the superior performance of transformer-based architectures, especially in the regime of far horizon time series forecasting. However, the current state of the art sparse Transformer architectures fail to couple down- and upsampling procedures to produce outputs in a similar resolution as the input. We propose the Yformer model, based on a novel Y-shaped encoder-decoder architecture that (1) uses direct connection from the downscaled encoder layer to the corresponding upsampled decoder layer in a U-Net inspired architecture, (2) Combines the downscaling/upsampling with sparse attention to capture long-range effects, and (3) stabilizes the encoder-decoder stacks with the addition of an auxiliary reconstruction loss. Extensive experiments have been conducted with relevant baselines on four benchmark datasets, demonstrating an average improvement of 19.82, 18.41 percentage MSE and 13.62, 11.85 percentage MAE in comparison to the current state of the art for the univariate and the multivariate settings respectively.

1.4CVSep 3, 2021

Deep Metric Learning for Ground Images

Raaghav Radhakrishnan, Jan Fabian Schmid, Randolf Scholz et al.

Ground texture based localization methods are potential prospects for low-cost, high-accuracy self-localization solutions for robots. These methods estimate the pose of a given query image, i.e. the current observation of the ground from a downward-facing camera, in respect to a set of reference images whose poses are known in the application area. In this work, we deal with the initial localization task, in which we have no prior knowledge about the current robot positioning. In this situation, the localization method would have to consider all available reference images. However, in order to reduce computational effort and the risk of receiving a wrong result, we would like to consider only those reference images that are actually overlapping with the query image. For this purpose, we propose a deep metric learning approach that retrieves the most similar reference images to the query image. In contrast to existing approaches to image retrieval for ground images, our approach achieves significantly better recall performance and improves the localization performance of a state-of-the-art ground texture based localization method.

7.5LGAug 5, 2021

Multimodal Meta-Learning for Time Series Regression

Sebastian Pineda Arango, Felix Heinrich, Kiran Madhusudhanan et al.

Recent work has shown the efficiency of deep learning models such as Fully Convolutional Networks (FCN) or Recurrent Neural Networks (RNN) to deal with Time Series Regression (TSR) problems. These models sometimes need a lot of data to be able to generalize, yet the time series are sometimes not long enough to be able to learn patterns. Therefore, it is important to make use of information across time series to improve learning. In this paper, we will explore the idea of using meta-learning for quickly adapting model parameters to new short-history time series by modifying the original idea of Model Agnostic Meta-Learning (MAML) \cite{finn2017model}. Moreover, based on prior work on multimodal MAML \cite{vuorio2019multimodal}, we propose a method for conditioning parameters of the model through an auxiliary network that encodes global information of the time series to extract meta-features. Finally, we apply the data to time series of different domains, such as pollution measurements, heart-rate sensors, and electrical battery data. We show empirically that our proposed meta-learning method learns TSR with few data fast and outperforms the baselines in 9 of 12 experiments.