CVJul 8, 2022Code
Bounding Box Disparity: 3D Metrics for Object Detection With Full Degree of FreedomMichael G. Adam, Martin Piccolrovazzi, Sebastian Eger et al.
The most popular evaluation metric for object detection in 2D images is Intersection over Union (IoU). Existing implementations of the IoU metric for 3D object detection usually neglect one or more degrees of freedom. In this paper, we first derive the analytic solution for three dimensional bounding boxes. As a second contribution, a closed-form solution of the volume-to-volume distance is derived. Finally, the Bounding Box Disparity is proposed as a combined positive continuous metric. We provide open source implementations of the three metrics as standalone python functions, as well as extensions to the Open3D library and as ROS nodes.
ROOct 9, 2023
Care3D: An Active 3D Object Detection Dataset of Real Robotic-Care EnvironmentsMichael G. Adam, Sebastian Eger, Martin Piccolrovazzi et al.
As labor shortage increases in the health sector, the demand for assistive robotics grows. However, the needed test data to develop those robots is scarce, especially for the application of active 3D object detection, where no real data exists at all. This short paper counters this by introducing such an annotated dataset of real environments. The captured environments represent areas which are already in use in the field of robotic health care research. We further provide ground truth data within one room, for assessing SLAM algorithms running directly on a health care robot.
26.2CVMay 7
A Causal Diffusion Model for Video Reconstruction from Ultra-Low-Bitrate RepresentationsCem Eteke, Batuhan Tosun, Martin Piccolrovazzi et al.
We study video reconstruction from ultra-low-bitrate representations, where the primary challenge shifts from encoding to decoding. In this regime, reconstruction with classical and neural codecs introduces blur, while generative and semantic approaches often struggle to jointly preserve fidelity, temporal consistency, and perceptual quality. To address these limitations, we propose a causal video diffusion model that reconstructs videos from ultra-low-bitrate semantics and highly compressed frames by jointly modeling their complementary information. We further introduce temporal-only distillation from a bidirectional teacher to enable parameter-efficient training and causal few-step inference. Through extensive quantitative, qualitative, and subjective evaluation, we show that the proposed method outperforms classical, neural, generative, and semantic baselines in ultra-low-bitrate video reconstruction.
CVFeb 27, 2024
ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily LivingMarsil Zakour, Partha Pratim Nath, Ludwig Lohmer et al.
Hand-Object Interactions (HOIs) are conditioned on spatial and temporal contexts like surrounding objects, previous actions, and future intents (for example, grasping and handover actions vary greatly based on objects proximity and trajectory obstruction). However, existing datasets for 4D HOI (3D HOI over time) are limited to one subject interacting with one object only. This restricts the generalization of learning-based HOI methods trained on those datasets. We introduce ADL4D, a dataset of up to two subjects interacting with different sets of objects performing Activities of Daily Living (ADL) like breakfast or lunch preparation activities. The transition between multiple objects to complete a certain task over time introduces a unique context lacking in existing datasets. Our dataset consists of 75 sequences with a total of 1.1M RGB-D frames, hand and object poses, and per-hand fine-grained action annotations. We develop an automatic system for multi-view multi-hand 3D pose annotation capable of tracking hand poses over time. We integrate and test it against publicly available datasets. Finally, we evaluate our dataset on the tasks of Hand Mesh Recovery (HMR) and Hand Action Segmentation (HAS).