ROMar 29, 2022
Gaze-based Object Detection in the WildDaniel Weber, Wolfgang Fuhl, Andreas Zell et al.
In human-robot collaboration, one challenging task is to teach a robot new yet unknown objects enabling it to interact with them. Thereby, gaze can contain valuable information. We investigate if it is possible to detect objects (object or no object) merely from gaze data and determine their bounding box parameters. For this purpose, we explore different sizes of temporal windows, which serve as a basis for the computation of heatmaps, i.e., the spatial distribution of the gaze data. Additionally, we analyze different grid sizes of these heatmaps, and demonstrate the functionality in a proof of concept using different machine learning techniques. Our method is characterized by its speed and resource efficiency compared to conventional object detectors. In order to generate the required data, we conducted a study with five subjects who could move freely and thus, turn towards arbitrary objects. This way, we chose a scenario for our data collection that is as realistic as possible. Since the subjects move while facing objects, the heatmaps also contain gaze data trajectories, complicating the detection and parameter regression. We make our data set publicly available to the research community for download.
LGJul 6, 2023
Improving the Efficiency of Human-in-the-Loop Systems: Adding Artificial to Human ExpertsJohannes Jakubik, Daniel Weber, Patrick Hemmer et al.
Information systems increasingly leverage artificial intelligence (AI) and machine learning (ML) to generate value from vast amounts of data. However, ML models are imperfect and can generate incorrect classifications. Hence, human-in-the-loop (HITL) extensions to ML models add a human review for instances that are difficult to classify. This study argues that continuously relying on human experts to handle difficult model classifications leads to a strong increase in human effort, which strains limited resources. To address this issue, we propose a hybrid system that creates artificial experts that learn to classify data instances from unknown classes previously reviewed by human experts. Our hybrid system assesses which artificial expert is suitable for classifying an instance from an unknown class and automatically assigns it. Over time, this reduces human effort and increases the efficiency of the system. Our experiments demonstrate that our approach outperforms traditional HITL systems for several benchmarks on image classification.
8.2SYMay 26
Container Unloading via Reinforcement Learning: Picking Order, Deadlock Avoidance, and Proof-of-Concept SimulationJan Rüdiger, Max Schenke, Daniel Weber
Unloading containers in the courier, express and parcel industry is a physically demanding and labor-intensive work. Automatizing this process is an important step towards increasing the efficiency of parcel-handling systems. This work investigates the potential of reinforcement learning to learn a policy for item selection in container unloading scenarios. For that, a simulation environment is created and a masked deep Q-learning with a specially designed neural network architecture is implemented. The results indicate that the agent can learn to select items with an average success rate of 60 %, which is significantly better than a random policy at a random chance of 20 %. The findings suggest that RL could be a promising approach for automatizing item unloading tasks in the future.
HCDec 12, 2023
Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality in Human-Robot InteractionDaniel Weber
Robots are becoming increasingly popular in a wide range of environments due to their exceptional work capacity, precision, efficiency, and scalability. This development has been further encouraged by advances in Artificial Intelligence, particularly Machine Learning. By employing sophisticated neural networks, robots are given the ability to detect and interact with objects in their vicinity. However, a significant drawback arises from the underlying dependency on extensive datasets and the availability of substantial amounts of training data for these object detection models. This issue becomes particularly problematic when the specific deployment location of the robot and the surroundings, are not known in advance. The vast and ever-expanding array of objects makes it virtually impossible to comprehensively cover the entire spectrum of existing objects using preexisting datasets alone. The goal of this dissertation was to teach a robot unknown objects in the context of Human-Robot Interaction (HRI) in order to liberate it from its data dependency, unleashing it from predefined scenarios. In this context, the combination of eye tracking and Augmented Reality created a powerful synergy that empowered the human teacher to communicate with the robot and effortlessly point out objects by means of human gaze. This holistic approach led to the development of a multimodal HRI system that enabled the robot to identify and visually segment the Objects of Interest in 3D space. Through the class information provided by the human, the robot was able to learn the objects and redetect them at a later stage. Due to the knowledge gained from this HRI based teaching, the robot's object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets, without being restricted to predefined classes, showcasing its versatility and adaptability.
SYJan 31, 2022
Steady-State Error Compensation in Reference Tracking and Disturbance Rejection Problems for Reinforcement Learning-Based ControlDaniel Weber, Maximilian Schenke, Oliver Wallscheid
Reinforcement learning (RL) is a promising, upcoming topic in automatic control applications. Where classical control approaches require a priori system knowledge, data-driven control approaches like RL allow a model-free controller design procedure, rendering them emergent techniques for systems with changing plant structures and varying parameters. While it was already shown in various applications that the transient control behavior for complex systems can be sufficiently handled by RL, the challenge of non-vanishing steady-state control errors remains, which arises from the usage of control policy approximations and finite training times. To overcome this issue, an integral action state augmentation (IASA) for actor-critic-based RL controllers is introduced that mimics an integrating feedback, which is inspired by the delta-input formulation within model predictive control. This augmentation does not require any expert knowledge, leaving the approach model free. As a result, the RL controller learns how to suppress steady-state control deviations much more effectively. Two exemplary applications from the domain of electrical energy engineering validate the benefit of the developed method both for reference tracking and disturbance rejection. In comparison to a standard deep deterministic policy gradient (DDPG) setup, the suggested IASA extension allows to reduce the steady-state error by up to 52 $\%$ within the considered validation scenarios.
CVJan 19, 2022
GroupGazer: A Tool to Compute the Gaze per Participant in Groups with integrated Calibration to Map the Gaze Online to a Screen or Beamer ProjectionWolfgang Fuhl, Daniel Weber, Shahram Eivazi
In this paper we present GroupGaze. It is a tool that can be used to calculate the gaze direction and the gaze position of whole groups. GroupGazer calculates the gaze direction of every single person in the image and allows to map these gaze vectors to a projection like a projector. In addition to the person-specific gaze direction, the person affiliation of each gaze vector is stored based on the position in the image. Also, it is possible to save the group attention after a calibration. The software is free to use and requires a simple webcam as well as an NVIDIA GPU and the operating system Windows or Linux. Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FGroupGazer&mode=list
CVJan 18, 2022
Pistol: Pupil Invisible Supportive Tool to extract Pupil, Iris, Eye Opening, Eye Movements, Pupil and Iris Gaze Vector, and 2D as well as 3D GazeWolfgang Fuhl, Daniel Weber, Shahram Eivazi
This paper describes a feature extraction and gaze estimation software, named \textit{Pistol} that can be used with Pupil Invisible projects and other eye trackers in the future. In offline mode, our software extracts multiple features from the eye including, the pupil and iris ellipse, eye aperture, pupil vector, iris vector, eye movement types from pupil and iris velocities, marker detection, marker distance, 2D gaze estimation for the pupil center, iris center, pupil vector, and iris vector using Levenberg Marquart fitting and neural networks. The gaze signal is computed in 2D for each eye and each feature separately and for both eyes in 3D also for each feature separately. We hope this software helps other researchers to extract state-of-the-art features for their research out of their recordings. Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FPISTOL&mode=list
CRJun 7, 2021
Osiris: Automated Discovery of Microarchitectural Side ChannelsDaniel Weber, Ahmad Ibrahim, Hamed Nemati et al.
In the last years, a series of side channels have been discovered on CPUs. These side channels have been used in powerful attacks, e.g., on cryptographic implementations, or as building blocks in transient-execution attacks such as Spectre or Meltdown. However, in many cases, discovering side channels is still a tedious manual process. In this paper, we present Osiris, a fuzzing-based framework to automatically discover microarchitectural side channels. Based on a machine-readable specification of a CPU's ISA, Osiris generates instruction-sequence triples and automatically tests whether they form a timing-based side channel. Furthermore, Osiris evaluates their usability as a side channel in transient-execution attacks, i.e., as the microarchitectural encoding for attacks like Spectre. In total, we discover four novel timing-based side channels on Intel and AMD CPUs. Based on these side channels, we demonstrate exploitation in three case studies. We show that our microarchitectural KASLR break using non-temporal loads, FlushConflict, even works on the new Intel Ice Lake and Comet Lake microarchitectures. We present a cross-core cross-VM covert channel that is not relying on the memory subsystem and transmits up to 1 kbit/s. We demonstrate this channel on the AWS cloud, showing that it is stealthy and noise resistant. Finally, we demonstrate Stream+Reload, a covert channel for transient-execution attacks that, on average, allows leaking 7.83 bytes within a transient window, improving state-of-the-art attacks that only leak up to 3 bytes.
LGMay 5, 2021
Non-Autoregressive vs Autoregressive Neural Networks for System IdentificationDaniel Weber, Clemens Gühmann
The application of neural networks to non-linear dynamic system identification tasks has a long history, which consists mostly of autoregressive approaches. Autoregression, the usage of the model outputs of previous time steps, is a method of transferring a system state between time steps, which is not necessary for modeling dynamic systems with modern neural network structures, such as gated recurrent units (GRUs) and Temporal Convolutional Networks (TCNs). We compare the accuracy and execution performance of autoregressive and non-autoregressive implementations of a GRU and TCN on the simulation task of three publicly available system identification benchmarks. Our results show, that the non-autoregressive neural networks are significantly faster and at least as accurate as their autoregressive counterparts. Comparisons with other state-of-the-art black-box system identification methods show, that our implementation of the non-autoregressive GRU is the best performing neural network-based system identification method, and in the benchmarks without extrapolation, the best performing black-box method.
LGApr 15, 2021
RIANN -- A Robust Neural Network Outperforms Attitude Estimation FiltersDaniel Weber, Clemens Gühmann, Thomas Seel
Inertial-sensor-based attitude estimation is a crucial technology in various applications, from human motion tracking to autonomous aerial and ground vehicles. Application scenarios differ in characteristics of the performed motion, presence of disturbances, and environmental conditions. Since state-of-the-art attitude estimators do not generalize well over these characteristics, their parameters must be tuned for the individual motion characteristics and circumstances. We propose RIANN, a ready-to-use, neural network-based, parameter-free, real-time-capable inertial attitude estimator, which generalizes well across different motion dynamics, environments, and sampling rates, without the need for application-specific adaptations. We gather six publicly available datasets of which we exploit two datasets for the method development and the training, and we use four datasets for evaluation of the trained estimator in three different test scenarios with varying practical relevance. Results show that RIANN outperforms state-of-the-art attitude estimation filters in the sense that it generalizes much better across a variety of motions and conditions in different applications, with different sensor hardware and different sampling frequencies. This is true even if the filters are tuned on each individual test dataset, whereas RIANN was trained on completely separate data and has never seen any of these test datasets. RIANN can be applied directly without adaptations or training and is therefore expected to enable plug-and-play solutions in numerous applications, especially when accuracy is crucial but no ground-truth data is available for tuning or when motion and disturbance characteristics are uncertain. We made RIANN publicly available.
CVJan 11, 2021
The Gaze and Mouse Signal as additional Source for User Fingerprints in Browser ApplicationsWolfgang Fuhl, Daniel Weber, Shahram Eivazi
In this work, we inspect different data sources for browser fingerprints. We show which disadvantages and limitations browser statistics have and how this can be avoided with other data sources. Since human visual behavior is a rich source of information and also contains person specific information, it is a valuable source for browser fingerprints. However, human gaze acquisition in the browser also has disadvantages, such as inaccuracies via webcam and the restriction that the user must first allow access to the camera. However, it is also known that the mouse movements and the human gaze correlate and therefore, the mouse movements can be used instead of the gaze signal. In our evaluation, we show the influence of all possible combinations of the three information sources for user recognition and describe our simple approach in detail. Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FThe%20Gaze%20and%20Mouse%20Signal%20as%20additional%20Source%20...&mode=list
LGMay 14, 2020
Neural Networks Versus Conventional Filters for Inertial-Sensor-based Attitude EstimationDaniel Weber, Clemens Gühmann, Thomas Seel
Inertial measurement units are commonly used to estimate the attitude of moving objects. Numerous nonlinear filter approaches have been proposed for solving the inherent sensor fusion problem. However, when a large range of different dynamic and static rotational and translational motions is considered, the attainable accuracy is limited by the need for situation-dependent adjustment of accelerometer and gyroscope fusion weights. We investigate to what extent these limitations can be overcome by means of artificial neural networks and how much domain-specific optimization of the neural network model is required to outperform the conventional filter solution. A diverse set of motion recordings with a marker-based optical ground truth is used for performance evaluation and comparison. The proposed neural networks are found to outperform the conventional filter across all motions only if domain-specific optimizations are introduced. We conclude that they are a promising tool for inertial-sensor-based real-time attitude estimation, but both expert knowledge and rich datasets are required to achieve top performance.