UnweaveNet: Unweaving Activity Stories
This addresses the challenge of understanding complex activity sequences in egocentric video analysis, though it appears incremental as it builds on existing datasets and methods.
The paper tackles the problem of parsing unscripted daily activity videos into constituent activity threads, introducing UnweaveNet with a thread bank representation and neural controller, and demonstrates its efficacy through self-supervised pretraining on the EPIC-KITCHENS dataset.
Our lives can be seen as a complex weaving of activities; we switch from one activity to another, to maximise our achievements or in reaction to demands placed upon us. Observing a video of unscripted daily activities, we parse the video into its constituent activity threads through a process we call unweaving. To accomplish this, we introduce a video representation explicitly capturing activity threads called a thread bank, along with a neural controller capable of detecting goal changes and resuming of past activities, together forming UnweaveNet. We train and evaluate UnweaveNet on sequences from the unscripted egocentric dataset EPIC-KITCHENS. We propose and showcase the efficacy of pretraining UnweaveNet in a self-supervised manner.