FEEL (Force-Enhanced Egocentric Learning): A Dataset for Physical Action Understanding
This addresses the need for better physical interaction understanding in robotics and AI, though it is incremental as it focuses on dataset creation and application to existing tasks.
The authors tackled the problem of physical action understanding by introducing FEEL, a large-scale dataset pairing force measurements from custom gloves with egocentric video, containing about 3 million force-synchronized frames. They demonstrated its utility by achieving state-of-the-art temporal contact segmentation and competitive pixel-level segmentation without manual annotations, and improving transfer performance on action understanding tasks across multiple datasets.
We introduce FEEL (Force-Enhanced Egocentric Learning), the first large-scale dataset pairing force measurements gathered from custom piezoresistive gloves with egocentric video. Our gloves enable scalable data collection, and FEEL contains approximately 3 million force-synchronized frames of natural unscripted manipulation in kitchen environments, with 45% of frames involving hand-object contact. Because force is the underlying cause that drives physical interaction, it is a critical primitive for physical action understanding. We demonstrate the utility of force for physical action understanding through application of FEEL to two families of tasks: (1) contact understanding, where we jointly perform temporal contact segmentation and pixel-level contacted object segmentation; and, (2) action representation learning, where force prediction serves as a self-supervised pretraining objective for video backbones. We achieve state-of-the-art temporal contact segmentation results and competitive pixel-level segmentation results without any need for manual contacted object segmentation annotations. Furthermore we demonstrate that action representation learning with FEEL improves transfer performance on action understanding tasks without any manual labels over EPIC-Kitchens, SomethingSomething-V2, EgoExo4D and Meccano.