Sebastian Hartwig

CV
h-index15
5papers
34citations
Novelty32%
AI Score22

5 Papers

LGApr 27, 2023
HPSCAN: Human Perception-Based Scattered Data Clustering

Sebastian Hartwig, Christian van Onzenoodt, Dominik Engel et al.

Cluster separation is a task typically tackled by widely used clustering techniques, such as k-means or DBSCAN. However, these algorithms are based on non-perceptual metrics, and our experiments demonstrate that their output does not reflect human cluster perception. To bridge the gap between human cluster perception and machine-computed clusters, we propose HPSCAN, a learning strategy that operates directly on scattered data. To learn perceptual cluster separation on such data, we crowdsourced the labeling of 7,320 bivariate (scatterplot) datasets to 384 human participants. We train our HPSCAN model on these human-annotated data. Instead of rendering these data as scatterplot images, we used their x and y point coordinates as input to a modified PointNet++ architecture, enabling direct inference on point clouds. In this work, we provide details on how we collected our dataset, report statistics of the resulting annotations, and investigate the perceptual agreement of cluster separation for real-world data. We also report the training and evaluation protocol for HPSCAN and introduce a novel metric, that measures the accuracy between a clustering technique and a group of human annotators. We explore predicting point-wise human agreement to detect ambiguities. Finally, we compare our approach to ten established clustering techniques and demonstrate that HPSCAN is capable of generalizing to unseen and out-of-scope data.

CVJun 27, 2022
Monocular Depth Decomposition of Semi-Transparent Volume Renderings

Dominik Engel, Sebastian Hartwig, Timo Ropinski

Neural networks have shown great success in extracting geometric information from color images. Especially, monocular depth estimation networks are increasingly reliable in real-world scenes. In this work we investigate the applicability of such monocular depth estimation networks to semi-transparent volume rendered images. As depth is notoriously difficult to define in a volumetric scene without clearly defined surfaces, we consider different depth computations that have emerged in practice, and compare state-of-the-art monocular depth estimation approaches for these different interpretations during an evaluation considering different degrees of opacity in the renderings. Additionally, we investigate how these networks can be extended to further obtain color and opacity information, in order to create a layered representation of the scene based on a single color image. This layered representation consists of spatially separated semi-transparent intervals that composite to the original input rendering. In our experiments we show that existing approaches to monocular depth estimation can be adapted to perform well on semi-transparent volume renderings, which has several applications in the area of scientific visualization, like re-composition with additional objects and labels or additional shading.

CVMar 18, 2024
A Survey on Quality Metrics for Text-to-Image Generation

Sebastian Hartwig, Dominik Engel, Leon Sick et al.

AI-based text-to-image models do not only excel at generating realistic images, they also give designers more and more fine-grained control over the image content. Consequently, these approaches have gathered increased attention within the computer graphics research community, which has been historically devoted towards traditional rendering techniques, that offer precise control over scene parameters (e.g., objects, materials, and lighting). While the quality of conventionally rendered images is assessed through well established image quality metrics, such as SSIM or PSNR, the unique challenges of text-to-image generation require other, dedicated quality metrics. These metrics must be able to not only measure overall image quality, but also how well images reflect given text prompts, whereby the control of scene and rendering parameters is interweaved. Within this survey, we provide a comprehensive overview of such text-to-image quality metrics, and propose a taxonomy to categorize these metrics. Our taxonomy is grounded in the assumption, that there are two main quality criteria, namely compositional quality and general quality, that contribute to the overall image quality. Besides the metrics, this survey covers dedicated text-to-image benchmark datasets, over which the metrics are frequently computed. Finally, we identify limitations and open challenges in the field of text-to-image generation, and derive guidelines for practitioners conducting text-to-image evaluation.

CVNov 25, 2024
CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation

Leon Sick, Dominik Engel, Sebastian Hartwig et al.

Traditionally, algorithms that learn to segment object instances in 2D images have heavily relied on large amounts of human-annotated data. Only recently, novel approaches have emerged tackling this problem in an unsupervised fashion. Generally, these approaches first generate pseudo-masks and then train a class-agnostic detector. While such methods deliver the current state of the art, they often fail to correctly separate instances overlapping in 2D image space since only semantics are considered. To tackle this issue, we instead propose to cut the semantic masks in 3D to obtain the final 2D instances by utilizing a point cloud representation of the scene. Furthermore, we derive a Spatial Importance function, which we use to resharpen the semantics along the 3D borders of instances. Nevertheless, these pseudo-masks are still subject to mask ambiguity. To address this issue, we further propose to augment the training of a class-agnostic detector with three Spatial Confidence components aiming to isolate a clean learning signal. With these contributions, our approach outperforms competing methods across multiple standard benchmarks for unsupervised instance segmentation and object detection.

CVMar 29, 2019
Training Object Detectors on Synthetic Images Containing Reflecting Materials

Sebastian Hartwig, Timo Ropinski

One of the grand challenges of deep learning is the requirement to obtain large labeled training data sets. While synthesized data sets can be used to overcome this challenge, it is important that these data sets close the reality gap, i.e., a model trained on synthetic image data is able to generalize to real images. Whereas, the reality gap can be considered bridged in several application scenarios, training on synthesized images containing reflecting materials requires further research. Since the appearance of objects with reflecting materials is dominated by the surrounding environment, this interaction needs to be considered during training data generation. Therefore, within this paper we examine the effect of reflecting materials in the context of synthetic image generation for training object detectors. We investigate the influence of rendering approach used for image synthesis, the effect of domain randomization, as well as the amount of used training data. To be able to compare our results to the state-of-the-art, we focus on indoor scenes as they have been investigated extensively. Within this scenario, bathroom furniture is a natural choice for objects with reflecting materials, for which we report our findings on real and synthetic testing data.