CVJul 24, 2023
Industrial Segment Anything -- a Case Study in Aircraft Manufacturing, Intralogistics, Maintenance, Repair, and OverhaulKeno Moenck, Arne Wendt, Philipp Prünte et al.
Deploying deep learning-based applications in specialized domains like the aircraft production industry typically suffers from the training data availability problem. Only a few datasets represent non-everyday objects, situations, and tasks. Recent advantages in research around Vision Foundation Models (VFM) opened a new area of tasks and models with high generalization capabilities in non-semantic and semantic predictions. As recently demonstrated by the Segment Anything Project, exploiting VFM's zero-shot capabilities is a promising direction in tackling the boundaries spanned by data, context, and sensor variety. Although, investigating its application within specific domains is subject to ongoing research. This paper contributes here by surveying applications of the SAM in aircraft production-specific use cases. We include manufacturing, intralogistics, as well as maintenance, repair, and overhaul processes, also representing a variety of other neighboring industrial domains. Besides presenting the various use cases, we further discuss the injection of domain knowledge.
LGJan 23Code
CUROCKET: Optimizing ROCKET for GPUOle Stüven, Keno Moenck, Thorsten Schüppstuhl
ROCKET (RandOm Convolutional KErnel Transform) is a feature extraction algorithm created for Time Series Classification (TSC), published in 2019. It applies convolution with randomly generated kernels on a time series, producing features that can be used to train a linear classifier or regressor like Ridge. At the time of publication, ROCKET was on par with the best state-of-the-art algorithms for TSC in terms of accuracy while being significantly less computationally expensive, making ROCKET a compelling algorithm for TSC. This also led to several subsequent versions, further improving accuracy and computational efficiency. The currently available ROCKET implementations are mostly bound to execution on CPU. However, convolution is a task that can be highly parallelized and is therefore suited to be executed on GPU, which speeds up the computation significantly. A key difficulty arises from the inhomogeneous kernels ROCKET uses, making standard methods for applying convolution on GPU inefficient. In this work, we propose an algorithm that is able to efficiently perform ROCKET on GPU and achieves up to 11 times higher computational efficiency per watt than ROCKET on CPU. The code for CUROCKET is available in this repository https://github.com/oleeven/CUROCKET on github.
28.0HCMay 8
Hot Wire 5D+: Evaluating Cognitive and Motor Trade-offs of Visual Feedback for 5D Augmented Reality TrajectoriesChristian Masuhr, Julian Koch, Arne Wendt et al.
Augmented Reality (AR) is increasingly utilized to guide users through complex spatial tasks in domains such as manufacturing, non-destructive testing, and surgery. These applications often require strict compliance with 5D+ trajectories using rotation-symmetric tools (3D position, 2D orientation, and movement speed). However, the sensori-motor baselines of untrained users during these multidimensional tracing tasks, along with the cognitive-motor trade-offs induced by varying visual feedback paradigms, remain underexplored. We present a controlled within-subjects user study (N=30) evaluating three distinct AR UI concepts for trajectory guidance, both with and without explicit orientation constraints. We analyzed spatial, orientational, and speed compliance based on the internal AR tracking, which was validated against a high-precision external optical tracking system to rule out hardware drift. By segmenting the execution into transient and steady-state phases and applying Aligned Rank Transform (ART) ANOVA, we isolated the interaction effects between visual design and task complexity. Alongside subjective metrics (NASA-TLX, SUS), our results establish conservative performance baselines for novice users performing freehand 5D trajectory following. We reveal orientation-induced cognitive-motor trade-offs and identify mitigating UI synergies. Ultimately, we provide empirical baselines and actionable design guidelines for developing effective AR guidance systems.
CVFeb 23
Open-vocabulary 3D scene perception in industrial environmentsKeno Moenck, Adrian Philip Florea, Julian Koch et al.
Autonomous vision applications in production, intralogistics, or manufacturing environments require perception capabilities beyond a small, fixed set of classes. Recent open-vocabulary methods, leveraging 2D Vision-Language Foundation Models (VLFMs), target this task but often rely on class-agnostic segmentation models pre-trained on non-industrial datasets (e.g., household scenes). In this work, we first demonstrate that such models fail to generalize, performing poorly on common industrial objects. Therefore, we propose a training-free, open-vocabulary 3D perception pipeline that overcomes this limitation. Instead of using a pre-trained model to generate instance proposals, our method simply generates masks by merging pre-computed superpoints based on their semantic features. Following, we evaluate the domain-adapted VLFM "IndustrialCLIP" on a representative 3D industrial workshop scene for open-vocabulary querying. Our qualitative results demonstrate successful segmentation of industrial objects.
CVJun 14, 2024
Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial SettingsKeno Moenck, Duc Trung Thieu, Julian Koch et al.
In recent years, the upstream of Large Language Models (LLM) has also encouraged the computer vision community to work on substantial multimodal datasets and train models on a scale in a self-/semi-supervised manner, resulting in Vision Foundation Models (VFM), as, e.g., Contrastive Language-Image Pre-training (CLIP). The models generalize well and perform outstandingly on everyday objects or scenes, even on downstream tasks, tasks the model has not been trained on, while the application in specialized domains, as in an industrial context, is still an open research question. Here, fine-tuning the models or transfer learning on domain-specific data is unavoidable when objecting to adequate performance. In this work, we, on the one hand, introduce a pipeline to generate the Industrial Language-Image Dataset (ILID) based on web-crawled data; on the other hand, we demonstrate effective self-supervised transfer learning and discussing downstream tasks after training on the cheaply acquired ILID, which does not necessitate human labeling or intervention. With the proposed approach, we contribute by transferring approaches from state-of-the-art research around foundation models, transfer learning strategies, and applications to the industrial domain.
ROJan 5, 2022
Proxying ROS communications -- enabling containerized ROS deployments in distributed multi-host environmentsArne Wendt, Thorsten Schüppstuhl
With the ability to use containers at the edge, they pose a unified solution to combat the complexity of distributed multi-host ROS deployments, as well as individual ROS-node and dependency deployment. The bidirectional communication in ROS poses a challenge to using containerized ROS deployments alongside non-containerized ones spread over multiple machines though. We will analyze the communication protocol employed by ROS, and the suitability of different container networking modes and their implications on ROS deployments. Finally, we will present a layer 7 transparent proxy server architecture for ROS, as a solution to the identified problems. Enabling the use of ROS not only in containerized environments, but proxying ROS between network segments in general.
RODec 22, 2021
Semantically enriched spatial modelling of industrial indoor environments enabling location-based servicesArne Wendt, Michael Brand, Thorsten Schüppstuhl
This paper presents a concept for a software system called RAIL representing industrial indoor environments in a dynamic spatial model, aimed at easing development and provision of location-based services. RAIL integrates data from different sensor modalities and additional contextual information through a unified interface. Approaches to environmental modelling from other domains are reviewed and analyzed for their suitability regarding the requirements for our target domains; intralogistics and production. Subsequently a novel way of modelling data representing indoor space, and an architecture for the software system are proposed.
RODec 21, 2021
A Solution to the Generalized ROS Hardware IO Problem -- A Generic Modbus/TCP Device Driver for PLCs, Sensors and ActuatorsArne Wendt, Thorsten Schüppstuhl
The Robot Operating System (ROS) provides a software framework, and ecosystem of knowledge and community supplied resources to rapidly develop and prototype intelligent robotics applications. By standardizing communication, configuration and invocation of software modules, ROS facilitates reuse of device-driver and algorithm implementations. Using existing implementations of functionality allows users to assemble their robotics application from tested and known-good capabilities. Despite the efforts of the ROS-Industrial consortium and projects like ROSIN to bring ROS to industrial applications and integrate industrial hardware, we observe a lack of options to generically integrate basic physical IO. In this work we lay out and provide a solution to this problem by implementing a generic Modbus/TCP device driver for ROS.