CVHCNEJun 14, 2024

PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos

arXiv:2407.09503v28 citations
AI Analysis

This work provides a dataset for researchers and developers building action recommendation systems in augmented and virtual reality, addressing a gap in ego-centric video analysis.

The authors tackled the lack of action recommendation annotations in ego-centric video datasets by releasing PARSE-Ego4D, which includes over 18,000 context-aware action suggestions generated via LLMs and validated through human annotation, enabling new tasks for intelligent assistance systems.

Intelligent assistance involves not only understanding but also action. Existing ego-centric video datasets contain rich annotations of the videos, but not of actions that an intelligent assistant could perform in the moment. To address this gap, we release PARSE-Ego4D, a new set of personal action recommendation annotations for the Ego4D dataset. We take a multi-stage approach to generating and evaluating these annotations. First, we used a prompt-engineered large language model (LLM) to generate context-aware action suggestions and identified over 18,000 action suggestions. While these synthetic action suggestions are valuable, the inherent limitations of LLMs necessitate human evaluation. To ensure high-quality and user-centered recommendations, we conducted a large-scale human annotation study that provides grounding in human preferences for all of PARSE-Ego4D. We analyze the inter-rater agreement and evaluate subjective preferences of participants. Based on our synthetic dataset and complete human annotations, we propose several new tasks for action suggestions based on ego-centric videos. We encourage novel solutions that improve latency and energy requirements. The annotations in PARSE-Ego4D will support researchers and developers who are working on building action recommendation systems for augmented and virtual reality systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes