CVJun 8, 2017

CortexNet: a Generic Network Family for Robust Visual Temporal Representations

arXiv:1706.02735v221 citations
Originality Incremental advance
AI Analysis

This work addresses the need for stable video representations in computer vision, offering a novel approach that could benefit video analysis applications, though it appears incremental in building on existing neural network architectures.

The authors tackled the problem of obtaining robust temporal representations from video data by proposing CortexNet, a deep neural network family inspired by the human visual system that includes top-down feedback and lateral connections, and introduced unsupervised and weakly supervised training schemes for tasks like frame anticipation and object tracking.

In the past five years we have observed the rise of incredibly well performing feed-forward neural networks trained supervisedly for vision related tasks. These models have achieved super-human performance on object recognition, localisation, and detection in still images. However, there is a need to identify the best strategy to employ these networks with temporal visual inputs and obtain a robust and stable representation of video data. Inspired by the human visual system, we propose a deep neural network family, CortexNet, which features not only bottom-up feed-forward connections, but also it models the abundant top-down feedback and lateral connections, which are present in our visual cortex. We introduce two training schemes - the unsupervised MatchNet and weakly supervised TempoNet modes - where a network learns how to correctly anticipate a subsequent frame in a video clip or the identity of its predominant subject, by learning egomotion clues and how to automatically track several objects in the current scene. Find the project website at https://engineering.purdue.edu/elab/CortexNet/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes