CVJul 25, 2022

TransCL: Transformer Makes Strong and Flexible Compressive Learning

arXiv:2207.11972v132 citationsh-index: 21Has Code
Originality Highly original
AI Analysis

This addresses the need for flexible and scalable compressive learning for high-resolution vision tasks, offering significant memory and computational savings.

The paper tackles the problem of compressive learning (CL) being limited to fixed compression ratios and simple datasets by proposing TransCL, a transformer-based framework that works on large-scale images with arbitrary compression ratios. The result is state-of-the-art performance in image classification and semantic segmentation, achieving nearly the same performance as using original data at a 10% compression ratio and satisfactory results even at 1%.

Compressive learning (CL) is an emerging framework that integrates signal acquisition via compressed sensing (CS) and machine learning for inference tasks directly on a small number of measurements. It can be a promising alternative to classical image-domain methods and enjoys great advantages in memory saving and computational efficiency. However, previous attempts on CL are not only limited to a fixed CS ratio, which lacks flexibility, but also limited to MNIST/CIFAR-like datasets and do not scale to complex real-world high-resolution (HR) data or vision tasks. In this paper, a novel transformer-based compressive learning framework on large-scale images with arbitrary CS ratios, dubbed TransCL, is proposed. Specifically, TransCL first utilizes the strategy of learnable block-based compressed sensing and proposes a flexible linear projection strategy to enable CL to be performed on large-scale images in an efficient block-by-block manner with arbitrary CS ratios. Then, regarding CS measurements from all blocks as a sequence, a pure transformer-based backbone is deployed to perform vision tasks with various task-oriented heads. Our sufficient analysis presents that TransCL exhibits strong resistance to interference and robust adaptability to arbitrary CS ratios. Extensive experiments for complex HR data demonstrate that the proposed TransCL can achieve state-of-the-art performance in image classification and semantic segmentation tasks. In particular, TransCL with a CS ratio of $10\%$ can obtain almost the same performance as when operating directly on the original data and can still obtain satisfying performance even with an extremely low CS ratio of $1\%$. The source codes of our proposed TransCL is available at \url{https://github.com/MC-E/TransCL/}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes