CVMay 28, 2021

What Is Considered Complete for Visual Recognition?

Lingxi Xie, Xiaopeng Zhang, Longhui Wei, Jianlong Chang, Qi Tian

arXiv:2105.13978v12.64 citations

Originality Synthesis-oriented

AI Analysis

This is an opinion paper advocating for a shift in focus from accuracy-complexity to compression-recovery tradeoffs in visual recognition, which could inspire new research directions but is incremental as it builds on existing pre-training concepts.

The paper argues that current visual recognition systems are far from achieving human-level completeness and cannot bridge this gap through increased human annotations, proposing a new pre-training task called learning-by-compression to optimize models for compact feature representation and data recovery.

This is an opinion paper. We hope to deliver a key message that current visual recognition systems are far from complete, i.e., recognizing everything that human can recognize, yet it is very unlikely that the gap can be bridged by continuously increasing human annotations. Based on the observation, we advocate for a new type of pre-training task named learning-by-compression. The computational models (e.g., a deep network) are optimized to represent the visual data using compact features, and the features preserve the ability to recover the original data. Semantic annotations, when available, play the role of weak supervision. An important yet challenging issue is the evaluation of image recovery, where we suggest some design principles and future research directions. We hope our proposal can inspire the community to pursue the compression-recovery tradeoff rather than the accuracy-complexity tradeoff.

View on arXiv PDF

Similar