Keno Moenck

h-index6

4papers

17citations

Novelty25%

AI Score39

Ranked #80,987 of 194,257 authors (top 42%)#27,341 in CV (top 46%)

4 Papers

5.9CVJul 24, 2023

Industrial Segment Anything -- a Case Study in Aircraft Manufacturing, Intralogistics, Maintenance, Repair, and Overhaul

Keno Moenck, Arne Wendt, Philipp Prünte et al.

Deploying deep learning-based applications in specialized domains like the aircraft production industry typically suffers from the training data availability problem. Only a few datasets represent non-everyday objects, situations, and tasks. Recent advantages in research around Vision Foundation Models (VFM) opened a new area of tasks and models with high generalization capabilities in non-semantic and semantic predictions. As recently demonstrated by the Segment Anything Project, exploiting VFM's zero-shot capabilities is a promising direction in tackling the boundaries spanned by data, context, and sensor variety. Although, investigating its application within specific domains is subject to ongoing research. This paper contributes here by surveying applications of the SAM in aircraft production-specific use cases. We include manufacturing, intralogistics, as well as maintenance, repair, and overhaul processes, also representing a variety of other neighboring industrial domains. Besides presenting the various use cases, we further discuss the injection of domain knowledge.

1.4LGJan 23Code

CUROCKET: Optimizing ROCKET for GPU

Ole Stüven, Keno Moenck, Thorsten Schüppstuhl

ROCKET (RandOm Convolutional KErnel Transform) is a feature extraction algorithm created for Time Series Classification (TSC), published in 2019. It applies convolution with randomly generated kernels on a time series, producing features that can be used to train a linear classifier or regressor like Ridge. At the time of publication, ROCKET was on par with the best state-of-the-art algorithms for TSC in terms of accuracy while being significantly less computationally expensive, making ROCKET a compelling algorithm for TSC. This also led to several subsequent versions, further improving accuracy and computational efficiency. The currently available ROCKET implementations are mostly bound to execution on CPU. However, convolution is a task that can be highly parallelized and is therefore suited to be executed on GPU, which speeds up the computation significantly. A key difficulty arises from the inhomogeneous kernels ROCKET uses, making standard methods for applying convolution on GPU inefficient. In this work, we propose an algorithm that is able to efficiently perform ROCKET on GPU and achieves up to 11 times higher computational efficiency per watt than ROCKET on CPU. The code for CUROCKET is available in this repository https://github.com/oleeven/CUROCKET on github.

1.5CVFeb 23

Open-vocabulary 3D scene perception in industrial environments

Keno Moenck, Adrian Philip Florea, Julian Koch et al.

Autonomous vision applications in production, intralogistics, or manufacturing environments require perception capabilities beyond a small, fixed set of classes. Recent open-vocabulary methods, leveraging 2D Vision-Language Foundation Models (VLFMs), target this task but often rely on class-agnostic segmentation models pre-trained on non-industrial datasets (e.g., household scenes). In this work, we first demonstrate that such models fail to generalize, performing poorly on common industrial objects. Therefore, we propose a training-free, open-vocabulary 3D perception pipeline that overcomes this limitation. Instead of using a pre-trained model to generate instance proposals, our method simply generates masks by merging pre-computed superpoints based on their semantic features. Following, we evaluate the domain-adapted VLFM "IndustrialCLIP" on a representative 3D industrial workshop scene for open-vocabulary querying. Our qualitative results demonstrate successful segmentation of industrial objects.

7.6CVJun 14, 2024Code

Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings

Keno Moenck, Duc Trung Thieu, Julian Koch et al.

In recent years, the upstream of Large Language Models (LLM) has also encouraged the computer vision community to work on substantial multimodal datasets and train models on a scale in a self-/semi-supervised manner, resulting in Vision Foundation Models (VFM), as, e.g., Contrastive Language-Image Pre-training (CLIP). The models generalize well and perform outstandingly on everyday objects or scenes, even on downstream tasks, tasks the model has not been trained on, while the application in specialized domains, as in an industrial context, is still an open research question. Here, fine-tuning the models or transfer learning on domain-specific data is unavoidable when objecting to adequate performance. In this work, we, on the one hand, introduce a pipeline to generate the Industrial Language-Image Dataset (ILID) based on web-crawled data; on the other hand, we demonstrate effective self-supervised transfer learning and discussing downstream tasks after training on the cheaply acquired ILID, which does not necessitate human labeling or intervention. With the proposed approach, we contribute by transferring approaches from state-of-the-art research around foundation models, transfer learning strategies, and applications to the industrial domain.