Dietrich Paulus

CV
h-index9
20papers
4,865citations
Novelty35%
AI Score37

20 Papers

CVSep 17, 2024Code
HS3-Bench: A Benchmark and Strong Baseline for Hyperspectral Semantic Segmentation in Driving Scenarios

Nick Theisen, Robin Bartsch, Dietrich Paulus et al.

Semantic segmentation is an essential step for many vision applications in order to understand a scene and the objects within. Recent progress in hyperspectral imaging technology enables the application in driving scenarios and the hope is that the devices perceptive abilities provide an advantage over RGB-cameras. Even though some datasets exist, there is no standard benchmark available to systematically measure progress on this task and evaluate the benefit of hyperspectral data. In this paper, we work towards closing this gap by providing the HyperSpectral Semantic Segmentation benchmark (HS3-Bench). It combines annotated hyperspectral images from three driving scenario datasets and provides standardized metrics, implementations, and evaluation protocols. We use the benchmark to derive two strong baseline models that surpass the previous state-of-the-art performances with and without pre-training on the individual datasets. Further, our results indicate that the existing learning-based methods benefit more from leveraging additional RGB training data than from leveraging the additional hyperspectral channels. This poses important questions for future research on hyperspectral imaging for semantic segmentation in driving scenarios. Code to run the benchmark and the strong baseline approaches are available under https://github.com/nickstheisen/hyperseg.

ROMay 14, 2019Code
Scratchy: A Lightweight Modular Autonomous Robot for Robotic Competitions

Raphael Memmesheimer, Isabelle Kuhlmann, Mark Mints et al.

We present Scratchy---a modular, lightweight robot built for low budget competition attendances. Its base is mainly built with standard 4040 aluminium profiles and the robot is driven by four mecanum wheels on brushless DC motors. In combination with a laser range finder we use estimated odometry -- which is calculated by encoders -- for creating maps using a particle filter. A RGB-D camera is utilized for object detection and pose estimation. Additionally, there is the option to use a 6-DOF arm to grip objects from an estimated pose or generally for manipulation tasks. The robot can be assembled in less than one hour and fits into two pieces of hand luggage or one bigger suitcase. Therefore, it provides a huge advantage for student teams that participate in robot competitions like the European Robotics League or RoboCup. Thus, this keeps the funding required for participation, which is often a big hurdle for student teams to overcome, low. The software and additional hardware descriptions are available under: https://github.com/homer-robotics/scratchy.

CVFeb 2, 2024
3D Vertebrae Measurements: Assessing Vertebral Dimensions in Human Spine Mesh Models Using Local Anatomical Vertebral Axes

Ivanna Kramer, Vinzent Rittel, Lara Blomenkamp et al.

Vertebral morphological measurements are important across various disciplines, including spinal biomechanics and clinical applications, pre- and post-operatively. These measurements also play a crucial role in anthropological longitudinal studies, where spinal metrics are repeatedly documented over extended periods. Traditionally, such measurements have been manually conducted, a process that is time-consuming. In this study, we introduce a novel, fully automated method for measuring vertebral morphology using 3D meshes of lumbar and thoracic spine models.Our experimental results demonstrate the method's capability to accurately measure low-resolution patient-specific vertebral meshes with mean absolute error (MAE) of 1.09 mm and those derived from artificially created lumbar spines, where the average MAE value was 0.7 mm. Our qualitative analysis indicates that measurements obtained using our method on 3D spine models can be accurately reprojected back onto the original medical images if these images are available.

CVSep 17, 2025
Data-Efficient Spectral Classification of Hyperspectral Data Using MiniROCKET and HDC-MiniROCKET

Nick Theisen, Kenny Schlegel, Dietrich Paulus et al.

The classification of pixel spectra of hyperspectral images, i.e. spectral classification, is used in many fields ranging from agricultural, over medical to remote sensing applications and is currently also expanding to areas such as autonomous driving. Even though for full hyperspectral images the best-performing methods exploit spatial-spectral information, performing classification solely on spectral information has its own advantages, e.g. smaller model size and thus less data required for training. Moreover, spectral information is complementary to spatial information and improvements on either part can be used to improve spatial-spectral approaches in the future. Recently, 1D-Justo-LiuNet was proposed as a particularly efficient model with very few parameters, which currently defines the state of the art in spectral classification. However, we show that with limited training data the model performance deteriorates. Therefore, we investigate MiniROCKET and HDC-MiniROCKET for spectral classification to mitigate that problem. The model extracts well-engineered features without trainable parameters in the feature extraction part and is therefore less vulnerable to limited training data. We show that even though MiniROCKET has more parameters it outperforms 1D-Justo-LiuNet in limited data scenarios and is mostly on par with it in the general case

CVDec 6, 2024
Spinal ligaments detection on vertebrae meshes using registration and 3D edge detection

Ivanna Kramer, Lara Blomenkamp, Kevin Weirauch et al.

Spinal ligaments are crucial elements in the complex biomechanical simulation models as they transfer forces on the bony structure, guide and limit movements and stabilize the spine. The spinal ligaments encompass seven major groups being responsible for maintaining functional interrelationships among the other spinal components. Determination of the ligament origin and insertion points on the 3D vertebrae models is an essential step in building accurate and complex spine biomechanical models. In our paper, we propose a pipeline that is able to detect 66 spinal ligament attachment points by using a step-wise approach. Our method incorporates a fast vertebra registration that strategically extracts only 15 3D points to compute the transformation, and edge detection for a precise projection of the registered ligaments onto any given patient-specific vertebra model. Our method shows high accuracy, particularly in identifying landmarks on the anterior part of the vertebra with an average distance of 2.24 mm for anterior longitudinal ligament and 1.26 mm for posterior longitudinal ligament landmarks. The landmark detection requires approximately 3.0 seconds per vertebra, providing a substantial improvement over existing methods. Clinical relevance: using the proposed method, the required landmarks that represent origin and insertion points for forces in the biomechanical spine models can be localized automatically in an accurate and time-efficient manner.

IVDec 6, 2024
Reconstruction of 3D lumbar spine models from incomplete segmentations using landmark detection

Lara Blomenkamp, Ivanna Kramer, Sabine Bauer et al.

Patient-specific 3D spine models serve as a foundation for spinal treatment and surgery planning as well as analysis of loading conditions in biomechanical and biomedical research. Despite advancements in imaging technologies, the reconstruction of complete 3D spine models often faces challenges due to limitations in imaging modalities such as planar X-Ray and missing certain spinal structures, such as the spinal or transverse processes, in volumetric medical images and resulting segmentations. In this study, we present a novel accurate and time-efficient method to reconstruct complete 3D lumbar spine models from incomplete 3D vertebral bodies obtained from segmented magnetic resonance images (MRI). In our method, we use an affine transformation to align artificial vertebra models with patient-specific incomplete vertebrae. The transformation matrix is derived from vertebra landmarks, which are automatically detected on the vertebra endplates. The results of our evaluation demonstrate the high accuracy of the performed registration, achieving an average point-to-model distance of 1.95 mm. Additionally, in assessing the morphological properties of the vertebrae and intervertebral characteristics, our method demonstrated a mean absolute error (MAE) of 3.4° in the angles of functional spine units (FSUs), emphasizing its effectiveness in maintaining important spinal features throughout the transformation process of individual vertebrae. Our method achieves the registration of the entire lumbar spine, spanning segments L1 to L5, in just 0.14 seconds, showcasing its time-efficiency. Clinical relevance: the fast and accurate reconstruction of spinal models from incomplete input data such as segmentations provides a foundation for many applications in spine diagnostics, treatment planning, and the development of spinal healthcare solutions.

CVJan 14, 2024
Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs

Viktor Seib, Malte Roosen, Ida Germann et al.

Creating annotated datasets demands a substantial amount of manual effort. In this proof-of-concept work, we address this issue by proposing a novel image generation pipeline. The pipeline consists of three distinct generative adversarial networks (previously published), combined in a novel way to augment a dataset for pedestrian detection. Despite the fact that the generated images are not always visually pleasant to the human eye, our detection benchmark reveals that the results substantially surpass the baseline. The presented proof-of-concept work was done in 2020 and is now published as a technical report after a three years retention period.

ROOct 13, 2021
Next-Best-View Estimation based on Deep Reinforcement Learning for Active Object Classification

Christian Korbach, Markus D. Solbach, Raphael Memmesheimer et al.

The presentation and analysis of image data from a single viewpoint are often not sufficient to solve a task. Several viewpoints are necessary to obtain more information. The next-best-view problem attempts to find the optimal viewpoint with the greatest information gain for the underlying task. In this work, a robot arm holds an object in its end-effector and searches for a sequence of next-best-view to explicitly identify the object. We use Soft Actor-Critic (SAC), a method of deep reinforcement learning, to learn these next-best-views for a specific set of objects. The evaluation shows that an agent can learn to determine an object pose to which the robot arm should move an object. This leads to a viewpoint that provides a more accurate prediction to distinguish such an object from other objects better. We make the code publicly available for the scientific community and for reproducibility.

CVSep 27, 2021
Fusion-GCN: Multimodal Action Recognition using Graph Convolutional Networks

Michael Duhme, Raphael Memmesheimer, Dietrich Paulus

In this paper, we present Fusion-GCN, an approach for multimodal action recognition using Graph Convolutional Networks (GCNs). Action recognition methods based around GCNs recently yielded state-of-the-art performance for skeleton-based action recognition. With Fusion-GCN, we propose to integrate various sensor data modalities into a graph that is trained using a GCN model for multi-modal action recognition. Additional sensor measurements are incorporated into the graph representation, either on a channel dimension (introducing additional node attributes) or spatial dimension (introducing new nodes). Fusion-GCN was evaluated on two public available datasets, the UTD-MHAD- and MMACT datasets, and demonstrates flexible fusion of RGB sequences, inertial measurements and skeleton sequences. Our approach gets comparable results on the UTD-MHAD dataset and improves the baseline on the large-scale MMACT dataset by a significant margin of up to 12.37% (F1-Measure) with the fusion of skeleton estimates and accelerometer measurements.

CVDec 26, 2020
Skeleton-DML: Deep Metric Learning for Skeleton-Based One-Shot Action Recognition

Raphael Memmesheimer, Simon Häring, Nick Theisen et al.

One-shot action recognition allows the recognition of human-performed actions with only a single training example. This can influence human-robot-interaction positively by enabling the robot to react to previously unseen behaviour. We formulate the one-shot action recognition problem as a deep metric learning problem and propose a novel image-based skeleton representation that performs well in a metric learning setting. Therefore, we train a model that projects the image representations into an embedding space. In embedding space the similar actions have a low euclidean distance while dissimilar actions have a higher distance. The one-shot action recognition problem becomes a nearest-neighbor search in a set of activity reference samples. We evaluate the performance of our proposed representation against a variety of other skeleton-based image representations. In addition, we present an ablation study that shows the influence of different embedding vector sizes, losses and augmentation. Our approach lifts the state-of-the-art by 3.3% for the one-shot action recognition protocol on the NTU RGB+D 120 dataset under a comparable training setup. With additional augmentation our result improved over 7.7%.

CVApr 23, 2020
SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Raphael Memmesheimer, Nick Theisen, Dietrich Paulus

Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space. We encode signals into images and extract features using a deep residual CNN. Using triplet loss, we learn a feature embedding. The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions. Our approach is based on a signal level formulation and remains flexible across a variety of modalities. It further outperforms the baseline on the large scale NTU RGB+D 120 dataset for the One-Shot action recognition protocol by 5.6%. With just 60% of the training data, our approach still outperforms the baseline approach by 3.7%. With 40% of the training data, our approach performs comparably well to the second follow up. Further, we show that our approach generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data. Furthermore, our inter-joint and inter-sensor experiments suggest good capabilities on previously unseen setups.

CVMar 13, 2020
Gimme Signals: Discriminative signal encoding for multimodal activity recognition

Raphael Memmesheimer, Nick Theisen, Dietrich Paulus

We present a simple, yet effective and flexible method for action recognition supporting multiple sensor modalities. Multivariate signal sequences are encoded in an image and are then classified using a recently proposed EfficientNet CNN architecture. Our focus was to find an approach that generalizes well across different sensor modalities without specific adaptions while still achieving good results. We apply our method to 4 action recognition datasets containing skeleton sequences, inertial and motion capturing measurements as well as \wifi fingerprints that range up to 120 action classes. Our method defines the current best CNN-based approach on the NTU RGB+D 120 dataset, lifts the state of the art on the ARIL Wi-Fi dataset by +6.78%, improves the UTD-MHAD inertial baseline by +14.4%, the UTD-MHAD skeleton baseline by 1.13% and achieves 96.11% on the Simitate motion capturing data (80/20 split). We further demonstrate experiments on both, modality fusion on a signal level and signal reduction to prevent the representation from overloading.

CVJun 25, 2019
Gesture Recognition in RGB Videos UsingHuman Body Keypoints and Dynamic Time Warping

Pascal Schneider, Raphael Memmesheimer, Ivanna Kramer et al.

Gesture recognition opens up new ways for humans to intuitively interact with machines. Especially for service robots, gestures can be a valuable addition to the means of communication to, for example, draw the robot's attention to someone or something. Extracting a gesture from video data and classifying it is a challenging task and a variety of approaches have been proposed throughout the years. This paper presents a method for gesture recognition in RGB videos using OpenPose to extract the pose of a person and Dynamic Time Warping (DTW) in conjunction with One-Nearest-Neighbor (1NN) for time-series classification. The main features of this approach are the independence of any specific hardware and high flexibility, because new gestures can be added to the classifier by adding only a few examples of it. We utilize the robustness of the Deep Learning-based OpenPose framework while avoiding the data-intensive task of training a neural network ourselves. We demonstrate the classification performance of our method using a public dataset.

LGMay 15, 2019
Simitate: A Hybrid Imitation Learning Benchmark

Raphael Memmesheimer, Ivanna Mykhalchyshyna, Viktor Seib et al.

We present Simitate --- a hybrid benchmarking suite targeting the evaluation of approaches for imitation learning. A dataset containing 1938 sequences where humans perform daily activities in a realistic environment is presented. The dataset is strongly coupled with an integration into a simulator. RGB and depth streams with a resolution of 960$\mathbb{\times}$540 at 30Hz and accurate ground truth poses for the demonstrator's hand, as well as the object in 6 DOF at 120Hz are provided. Along with our dataset we provide the 3D model of the used environment, labeled object images and pre-trained models. A benchmarking suite that aims at fostering comparability and reproducibility supports the development of imitation learning approaches. Further, we propose and integrate evaluation metrics on assessing the quality of effect and trajectory of the imitation performed in simulation. Simitate is available on our project website: \url{https://agas.uni-koblenz.de/data/simitate/}.

ROMar 25, 2019
Trends, Challenges and Adopted Strategies in RoboCup@Home (2019 version)

Mauricio Matamoros, Viktor Seib, Dietrich Paulus

Scientific competitions are crucial in the field of service robotics. They foster knowledge exchange and benchmarking, allowing teams to test their research in unstandardized scenarios. In this paper, we summarize the trending solutions and approaches used in RoboCup@Home. Further on, we discuss the attained achievements and challenges to overcome in relation with the progress required to fulfill the long-term goal of the league. Consequently, we propose a set of milestones for upcoming competitions by considering the current capabilities of the robots and their limitations. With this work we aim at laying the foundations towards the creation of roadmaps that can help to direct efforts in testing and benchmarking in robotics competitions.

ROMar 6, 2019
Trends, Challenges and Adopted Strategies in RoboCup@Home

Mauricio Matamoros, Viktor Seib, Dietrich Paulus

Scientific competitions are crucial in the field of service robotics. They foster knowledge exchange and allow teams to test their research in unstandardized scenarios and compare result. Such is the case of RoboCup@Home. However, keeping track of all the technologies and solution approaches used by teams to solve the tests can be a challenge in itself. Moreover, after eleven years of competitions, it's easy to delve too much into the field, losing perspective and forgetting about the user's needs and long term goals. In this paper, we aim to tackle this problems by presenting a summary of the trending solutions and approaches used in RoboCup@Home, and discussing the attained achievements and challenges to overcome in relation with the progress required to fulfill the long-term goal of the league. Hence, considering the current capabilities of the robots and their limitations, we propose a set of milestones to address in upcoming competitions. With this work we lay the foundations towards the creation of roadmaps that can help to direct efforts in testing and benchmarking in robotics competitions.

ROFeb 2, 2019
RoboCup@Home: Summarizing achievements in over eleven years of competition

Mauricio Matamoros, Viktor Seib, Raphael Memmesheimer et al.

Scientific competitions are important in robotics because they foster knowledge exchange and allow teams to test their research in unstandardized scenarios and compare result. In the field of service robotics its role becomes crucial. Competitions like RoboCup@Home bring robots to people, a fundamental step to integrate them into society. In this paper we summarize and discuss the differences between the achievements claimed by teams in their team description papers, and the results observed during the competition^1 from a qualitative perspective. We conclude with a set of important challenges to be conquered first in order to take robots to people's homes. We believe that competitions are also an excellent opportunity to collect data of direct and unbiased interactions for further research. ^1 The authors belong to several teams who have participated in RoboCup@Home as early as 2007

ROFeb 2, 2019
From Commands to Goal-based Dialogs: A Roadmap to Achieve Natural Language Interaction in RoboCup@Home

Mauricio Matamoros, Karin Harbusch, Dietrich Paulus

On the one hand, speech is a key aspect to people's communication. On the other, it is widely acknowledged that language proficiency is related to intelligence. Therefore, intelligent robots should be able to understand, at least, people's orders within their application domain. These insights are not new in RoboCup@Home, but we lack of a long-term plan to evaluate this approach. In this paper we conduct a brief review of the achievements on automated speech recognition and natural language understanding in RoboCup@Home. Furthermore, we discuss main challenges to tackle in spoken human-robot interaction within the scope of this competition. Finally, we contribute by presenting a pipelined road map to engender research in the area of natural language understanding applied to domestic service robotics.

CVJul 30, 2018
Markerless Visual Robot Programming by Demonstration

Raphael Memmesheimer, Ivanna Mykhalchyshyna, Viktor Seib et al.

In this paper we present an approach for learning to imitate human behavior on a semantic level by markerless visual observation. We analyze a set of spatial constraints on human pose data extracted using convolutional pose machines and object informations extracted from 2D image sequences. A scene analysis, based on an ontology of objects and affordances, is combined with continuous human pose estimation and spatial object relations. Using a set of constraints we associate the observed human actions with a set of executable robot commands. We demonstrate our approach in a kitchen task, where the robot learns to prepare a meal.

CVMar 21, 2017
Simple Online and Realtime Tracking with a Deep Association Metric

Nicolai Wojke, Alex Bewley, Dietrich Paulus

Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms. In this paper, we integrate appearance information to improve the performance of SORT. Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches. In spirit of the original framework we place much of the computational complexity into an offline pre-training stage where we learn a deep association metric on a large-scale person re-identification dataset. During online application, we establish measurement-to-track associations using nearest neighbor queries in visual appearance space. Experimental evaluation shows that our extensions reduce the number of identity switches by 45%, achieving overall competitive performance at high frame rates.