FourierNet: Compact mask representation for instance segmentation using differentiable shape decoders
This addresses the problem of efficient and interpretable mask representation for instance segmentation, offering a compressed approach with competitive performance.
The paper tackles instance segmentation by introducing FourierNet, a method that uses a compact shape vector representation converted to masks via a differentiable shape decoder, achieving 30.6 mAP on MS COCO 2017 and 26.6 FPS at lower resolutions.
We present FourierNet, a single shot, anchor-free, fully convolutional instance segmentation method that predicts a shape vector. Consequently, this shape vector is converted into the masks' contour points using a fast numerical transform. Compared to previous methods, we introduce a new training technique, where we utilize a differentiable shape decoder, which manages the automatic weight balancing of the shape vector's coefficients. We used the Fourier series as a shape encoder because of its coefficient interpretability and fast implementation. FourierNet shows promising results compared to polygon representation methods, achieving 30.6 mAP on the MS COCO 2017 benchmark. At lower image resolutions, it runs at 26.6 FPS with 24.3 mAP. It reaches 23.3 mAP using just eight parameters to represent the mask (note that at least four parameters are needed for bounding box prediction only). Qualitative analysis shows that suppressing a reasonable proportion of higher frequencies of Fourier series, still generates meaningful masks. These results validate our understanding that lower frequency components hold higher information for the segmentation task, and therefore, we can achieve a compressed representation. Code is available at: github.com/cogsys-tuebingen/FourierNet.