CVAINov 1, 2023

Beyond still images: Temporal features and input variance resilience

arXiv:2311.00800v2h-index: 7
Originality Incremental advance
AI Analysis

This work addresses the gap in incorporating spatiotemporal features into image-understanding models for improved robustness in vision tasks.

The paper tackles the problem of vision models relying on static images by developing a brain-inspired model trained with videos that includes temporal features, resulting in models that become more resilient to input alterations.

Traditionally, vision models have predominantly relied on spatial features extracted from static images, deviating from the continuous stream of spatiotemporal features processed by the brain in natural vision. While numerous video-understanding models have emerged, incorporating videos into image-understanding models with spatiotemporal features has been limited. Drawing inspiration from natural vision, which exhibits remarkable resilience to input changes, our research focuses on the development of a brain-inspired model for vision understanding trained with videos. Our findings demonstrate that models that train on videos instead of still images and include temporal features become more resilient to various alternations on input media.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes