Unifying Few- and Zero-Shot Egocentric Action Recognition
This addresses the practical challenge of infinite action classes in egocentric vision, though it is incremental as it builds on existing methods and datasets.
The paper tackles the problem of open-set egocentric action recognition by unifying few- and zero-shot generalization, proposing new dataset splits from EPIC-KITCHENS and showing that adding a metric-learning loss improves zero-shot classification by up to 10% without harming few-shot performance.
Although there has been significant research in egocentric action recognition, most methods and tasks, including EPIC-KITCHENS, suppose a fixed set of action classes. Fixed-set classification is useful for benchmarking methods, but is often unrealistic in practical settings due to the compositionality of actions, resulting in a functionally infinite-cardinality label set. In this work, we explore generalization with an open set of classes by unifying two popular approaches: few- and zero-shot generalization (the latter which we reframe as cross-modal few-shot generalization). We propose a new set of splits derived from the EPIC-KITCHENS dataset that allow evaluation of open-set classification, and use these splits to show that adding a metric-learning loss to the conventional direct-alignment baseline can improve zero-shot classification by as much as 10%, while not sacrificing few-shot performance.