CVNov 23, 2023Code
Low Latency Instance Segmentation by Continuous Clustering for LiDAR SensorsAndreas Reich, Mirko Maehlisch
Low-latency instance segmentation of LiDAR point clouds is crucial in real-world applications because it serves as an initial and frequently-used building block in a robot's perception pipeline, where every task adds further delay. Particularly in dynamic environments, this total delay can result in significant positional offsets of dynamic objects, as seen in highway scenarios. To address this issue, we employ a new technique, which we call continuous clustering. Unlike most existing clustering approaches, which use a full revolution of the LiDAR sensor, we process the data stream in a continuous and seamless fashion. Our approach does not rely on the concept of complete or partial sensor rotations with multiple discrete range images; instead, it views the range image as a single and infinitely horizontally growing entity. Each new column of this continuous range image is processed as soon it is available. Obstacle points are clustered to existing instances in real-time and it is checked at a high-frequency which instances are completed in order to publish them without waiting for the completion of the revolution or some other integration period. In the case of rotating sensors, no problematic discontinuities between the points of the end and the start of a scan are observed. In this work we describe the two-layered data structure and the corresponding algorithm for continuous clustering. It is able to achieve an average latency of just 5 ms with respect to the latest timestamp of all points in the cluster. We are publishing the source code at https://github.com/UniBwTAS/continuous_clustering.
CLJul 29, 2025Code
Introducing HALC: A general pipeline for finding optimal prompting strategies for automated coding with LLMs in the computational social sciencesAndreas Reich, Claudia Thoms, Tobias Schrimpf
LLMs are seeing widespread use for task automation, including automated coding in the social sciences. However, even though researchers have proposed different prompting strategies, their effectiveness varies across LLMs and tasks. Often trial and error practices are still widespread. We propose HALC$-$a general pipeline that allows for the systematic and reliable construction of optimal prompts for any given coding task and model, permitting the integration of any prompting strategy deemed relevant. To investigate LLM coding and validate our pipeline, we sent a total of 1,512 individual prompts to our local LLMs in over two million requests. We test prompting strategies and LLM task performance based on few expert codings (ground truth). When compared to these expert codings, we find prompts that code reliably for single variables ($α$climate = .76; $α$movement = .78) and across two variables ($α$climate = .71; $α$movement = .74) using the LLM Mistral NeMo. Our prompting strategies are set up in a way that aligns the LLM to our codebook$-$we are not optimizing our codebook for LLM friendliness. Our paper provides insights into the effectiveness of different prompting strategies, crucial influencing factors, and the identification of reliable prompts for each coding task and model.
CVSep 25, 2024
MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual FeaturesKatharina Anderer, Andreas Reich, Matthias Wölfel
This paper presents a benchmark dataset for aligning lecture videos with corresponding slides and introduces a novel multimodal algorithm leveraging features from speech, text, and images. It achieves an average accuracy of 0.82 in comparison to SIFT (0.56) while being approximately 11 times faster. Using dynamic programming the algorithm tries to determine the optimal slide sequence. The results show that penalizing slide transitions increases accuracy. Features obtained via optical character recognition (OCR) contribute the most to a high matching accuracy, followed by image features. The findings highlight that audio transcripts alone provide valuable information for alignment and are beneficial if OCR data is lacking. Variations in matching accuracy across different lectures highlight the challenges associated with video quality and lecture style. The novel multimodal algorithm demonstrates robustness to some of these challenges, underscoring the potential of the approach.