CVNov 19, 2020

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

arXiv:2011.09876v381 citationsHas Code
AI Analysis

This work addresses the trade-off between mask resolution and computational complexity for researchers and practitioners in computer vision, offering a more efficient way to represent high-quality masks.

This paper proposes DCT-Mask, a novel mask representation for instance segmentation that encodes high-resolution binary grid masks into a compact vector using discrete cosine transform. This method significantly improves performance across various frameworks, backbones, datasets, and training schedules without increasing running speed.

Binary grid mask representation is broadly used in instance segmentation. A representative instantiation is Mask R-CNN which predicts masks on a $28\times 28$ binary grid. Generally, a low-resolution grid is not sufficient to capture the details, while a high-resolution grid dramatically increases the training complexity. In this paper, we propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector. Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods. Without any bells and whistles, DCT-Mask yields significant gains on different frameworks, backbones, datasets, and training schedules. It does not require any pre-processing or pre-training, and almost no harm to the running speed. Especially, for higher-quality annotations and more complex backbones, our method has a greater improvement. Moreover, we analyze the performance of our method from the perspective of the quality of mask representation. The main reason why DCT-Mask works well is that it obtains a high-quality mask representation with low complexity. Code is available at https://github.com/aliyun/DCT-Mask.git.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes