CVMay 28, 2021

What Is Considered Complete for Visual Recognition?

arXiv:2105.13978v14 citations
Originality Synthesis-oriented
AI Analysis

This is an opinion paper advocating for a shift in focus from accuracy-complexity to compression-recovery tradeoffs in visual recognition, which could inspire new research directions but is incremental as it builds on existing pre-training concepts.

The paper argues that current visual recognition systems are far from achieving human-level completeness and cannot bridge this gap through increased human annotations, proposing a new pre-training task called learning-by-compression to optimize models for compact feature representation and data recovery.

This is an opinion paper. We hope to deliver a key message that current visual recognition systems are far from complete, i.e., recognizing everything that human can recognize, yet it is very unlikely that the gap can be bridged by continuously increasing human annotations. Based on the observation, we advocate for a new type of pre-training task named learning-by-compression. The computational models (e.g., a deep network) are optimized to represent the visual data using compact features, and the features preserve the ability to recover the original data. Semantic annotations, when available, play the role of weak supervision. An important yet challenging issue is the evaluation of image recovery, where we suggest some design principles and future research directions. We hope our proposal can inspire the community to pursue the compression-recovery tradeoff rather than the accuracy-complexity tradeoff.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes