CVFeb 27, 2024

ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living

arXiv:2402.17758v12 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited generalization in learning-based hand-object interaction methods for robotics and human-computer interaction, though it is incremental as it extends existing dataset capabilities.

The authors tackled the limitation of existing 4D hand-object interaction datasets by introducing ADL4D, a dataset with up to two subjects interacting with multiple objects in daily activities, resulting in 75 sequences with 1.1M frames and improved performance on hand mesh recovery and action segmentation tasks.

Hand-Object Interactions (HOIs) are conditioned on spatial and temporal contexts like surrounding objects, previous actions, and future intents (for example, grasping and handover actions vary greatly based on objects proximity and trajectory obstruction). However, existing datasets for 4D HOI (3D HOI over time) are limited to one subject interacting with one object only. This restricts the generalization of learning-based HOI methods trained on those datasets. We introduce ADL4D, a dataset of up to two subjects interacting with different sets of objects performing Activities of Daily Living (ADL) like breakfast or lunch preparation activities. The transition between multiple objects to complete a certain task over time introduces a unique context lacking in existing datasets. Our dataset consists of 75 sequences with a total of 1.1M RGB-D frames, hand and object poses, and per-hand fine-grained action annotations. We develop an automatic system for multi-view multi-hand 3D pose annotation capable of tracking hand poses over time. We integrate and test it against publicly available datasets. Finally, we evaluate our dataset on the tasks of Hand Mesh Recovery (HMR) and Hand Action Segmentation (HAS).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes