CVCLLGJun 11, 2020

Disentangled Non-Local Neural Networks

arXiv:2006.06668v2376 citations
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in context modeling for computer vision tasks, offering an incremental improvement over existing non-local methods.

The paper tackled the limitation of non-local blocks in convolutional neural networks by decoupling their attention computation into separate whitened pairwise and unary terms, which improved performance on tasks like semantic segmentation, object detection, and action recognition, achieving gains such as a 1.5% mIoU increase on Cityscapes.

The non-local block is a popular module for strengthening the context modeling ability of a regular convolutional neural network. This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel. We also observe that the two terms trained alone tend to model different visual clues, e.g. the whitened pairwise term learns within-region relationships while the unary term learns salient boundaries. However, the two terms are tightly coupled in the non-local block, which hinders the learning of each. Based on these findings, we present the disentangled non-local block, where the two terms are decoupled to facilitate learning for both terms. We demonstrate the effectiveness of the decoupled design on various tasks, such as semantic segmentation on Cityscapes, ADE20K and PASCAL Context, object detection on COCO, and action recognition on Kinetics.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes