Oliver Struckmeier

LG
8papers
37citations
Novelty46%
AI Score26

8 Papers

LGOct 17, 2023Code
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport

Quentin Bouniot, Ievgen Redko, Anton Mallasto et al.

In the last decade, we have witnessed the introduction of several novel deep neural network (DNN) architectures exhibiting ever-increasing performance across diverse tasks. Explaining the upward trend of their performance, however, remains difficult as different DNN architectures of comparable depth and width -- common factors associated with their expressive power -- may exhibit a drastically different performance even when trained on the same dataset. In this paper, we introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for approximately measuring the non-linearity of deep neural networks. Built upon a score derived from closed-form optimal transport mappings, this signature provides a better understanding of the inner workings of a wide range of DNN architectures and learning paradigms, with a particular emphasis on the computer vision task. We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature and its potential for long-reaching implications. The code for our work is available at https://github.com/qbouniot/AffScoreDeep

LGMay 12, 2023Code
Learning representations that are closed-form Monge mapping optimal with application to domain adaptation

Oliver Struckmeier, Ievgen Redko, Anton Mallasto et al.

Optimal transport (OT) is a powerful geometric tool used to compare and align probability measures following the least effort principle. Despite its widespread use in machine learning (ML), OT problem still bears its computational burden, while at the same time suffering from the curse of dimensionality for measures supported on general high-dimensional spaces. In this paper, we propose to tackle these challenges using representation learning. In particular, we seek to learn an embedding space such that the samples of the two input measures become alignable in it with a simple affine mapping that can be calculated efficiently in closed-form. We then show that such approach leads to results that are comparable to solving the original OT problem when applied to the transfer learning task on which many OT baselines where previously evaluated in both homogeneous and heterogeneous DA settings. The code for our contribution is available at \url{https://github.com/Oleffa/LaOT}.

LGMar 12, 2021
Domain Curiosity: Learning Efficient Data Collection Strategies for Domain Adaptation

Karol Arndt, Oliver Struckmeier, Ville Kyrki

Domain adaptation is a common problem in robotics, with applications such as transferring policies from simulation to real world and lifelong learning. Performing such adaptation, however, requires informative data about the environment to be available during the adaptation. In this paper, we present domain curiosity -- a method of training exploratory policies that are explicitly optimized to provide data that allows a model to learn about the unknown aspects of the environment. In contrast to most curiosity methods, our approach explicitly rewards learning, which makes it robust to environment noise without sacrificing its ability to learn. We evaluate the proposed method by comparing how much a model can learn about environment dynamics given data collected by the proposed approach, compared to standard curious and random policies. The evaluation is performed using a toy environment, two simulated robot setups, and on a real-world haptic exploration task. The results show that the proposed method allows data-efficient and accurate estimation of dynamics.

LGDec 11, 2020
Autoencoding Slow Representations for Semi-supervised Data Efficient Regression

Oliver Struckmeier, Kshitij Tiwari, Ville Kyrki

The slowness principle is a concept inspired by the visual cortex of the brain. It postulates that the underlying generative factors of a quickly varying sensory signal change on a slower time scale. Unsupervised learning of intermediate representations utilizing abundant unlabeled sensory data can be leveraged to perform data-efficient supervised downstream regression. In this paper, we propose a general formulation of slowness for unsupervised representation learning adding a slowness regularization term to the estimate lower bound of the beta-VAE to encourage temporal similarity in observation and latent space. Within this framework we compare existing slowness regularization terms such as the L1 and L2 loss used in existing end-to-end methods, the SlowVAE and propose a new term based on Brownian motion. We empirically evaluate these slowness regularization terms with respect to their downstream task performance and data efficiency. We find that slow representations lead to equal or better downstream task performance and data efficiency in different experiment domains when compared to representations without slowness regularization. Finally, we discuss how the Frechet Inception Distance (FID), traditionally used to determine the generative capabilities of GANs, can serve as a measure to predict the performance of pre-trained Autoencoder model in a supervised downstream task and accelerate hyperparameter search.

ROSep 16, 2019
MuPNet: Multi-modal Predictive Coding Network for Place Recognition by Unsupervised Learning of Joint Visuo-Tactile Latent Representations

Oliver Struckmeier, Kshitij Tiwari, Shirin Dora et al.

Extracting and binding salient information from different sensory modalities to determine common features in the environment is a significant challenge in robotics. Here we present MuPNet (Multi-modal Predictive Coding Network), a biologically plausible network architecture for extracting joint latent features from visuo-tactile sensory data gathered from a biomimetic mobile robot. In this study we evaluate MuPNet applied to place recognition as a simulated biomimetic robot platform explores visually aliased environments. The F1 scores demonstrate that its performance over prior hand-crafted sensory feature extraction techniques is equivalent under controlled conditions, with significant improvement when operating in novel environments.

ROJun 14, 2019
ViTa-SLAM: A Bio-inspired Visuo-Tactile SLAM for Navigation while Interacting with Aliased Environments

Oliver Struckmeier, Kshitij Tiwari, Mohammed Salman et al.

RatSLAM is a rat hippocampus-inspired visual Simultaneous Localization and Mapping (SLAM) framework capable of generating semi-metric topological representations of indoor and outdoor environments. Whisker-RatSLAM is a 6D extension of the RatSLAM and primarily focuses on object recognition by generating point clouds of objects based on whisking information. This paper introduces a novel extension to both former works that is referred to as ViTa-SLAM that harnesses both vision and tactile information for performing SLAM. This not only allows the robot to perform natural interaction with the environment whilst navigating, as is normally seen in nature, but also provides a mechanism to fuse non-unique tactile and unique visual data. Compared to the former works, our approach can handle ambiguous scenes in which one sensor alone is not capable of identifying false-positive loop-closures.

CVMay 28, 2019
LeagueAI: Improving object detector performance and flexibility through automatically generated training data and domain randomization

Oliver Struckmeier

In this technical report I present my method for automatic synthetic dataset generation for object detection and demonstrate it on the video game League of Legends. This report furthermore serves as a handbook on how to automatically generate datasets and as an introduction on the dataset generation part of the LeagueAI framework. The LeagueAI framework is a software framework that provides detailed information about the game League of Legends based on the same input a human player would have, namely vision. The framework allows researchers and enthusiasts to develop their own intelligent agents or to extract detailed information about the state of the game. A big problem of machine vision applications usually is the laborious work of gathering large amounts of hand labeled data. Thus, a crucial part of the vision pipeline of the LeagueAI framework, the dataset generation, is presented in this report. The method involves extracting image raw data from the game's 3D models and combining them with the game background to create game-like synthetic images and to generate the corresponding labels automatically. In an experiment I compared a model trained on synthetic data to a model trained on hand labeled data and a model trained on a combined dataset. The model trained on the synthetic data showed higher detection precision on more classes and more reliable tracking performance of the player character. The model trained on the combined dataset did not perform better because of the different formats of the older hand labeled dataset and the synthetic data.

ROApr 11, 2019
ViTa-SLAM: Biologically-Inspired Visuo-Tactile SLAM

Oliver Struckmeier, Kshitij Tiwari, Martin J. Pearson et al.

In this work, we propose a novel, bio-inspired multi-sensory SLAM approach called ViTa-SLAM. Compared to other multisensory SLAM variants, this approach allows for a seamless multi-sensory information fusion whilst naturally interacting with the environment. The algorithm is empirically evaluated in a simulated setting using a biomimetic robot platform called the WhiskEye. Our results show promising performance enhancements over existing bio-inspired SLAM approaches in terms of loop-closure detection.