CVJan 19, 2020

Learning Compositional Neural Information Fusion for Human Parsing

arXiv:2001.06804v1131 citations
Originality Incremental advance
AI Analysis

This work addresses human parsing for computer vision applications, offering an incremental improvement through a novel fusion method.

The paper tackles the problem of human parsing by proposing a neural information fusion framework that combines direct, bottom-up, and top-down inferences over a compositional hierarchy, achieving state-of-the-art results on four datasets with a processing speed of 23fps.

This work proposes to combine neural networks with the compositional hierarchy of human bodies for efficient and complete human parsing. We formulate the approach as a neural information fusion framework. Our model assembles the information from three inference processes over the hierarchy: direct inference (directly predicting each part of a human body using image information), bottom-up inference (assembling knowledge from constituent parts), and top-down inference (leveraging context from parent nodes). The bottom-up and top-down inferences explicitly model the compositional and decompositional relations in human bodies, respectively. In addition, the fusion of multi-source information is conditioned on the inputs, i.e., by estimating and considering the confidence of the sources. The whole model is end-to-end differentiable, explicitly modeling information flows and structures. Our approach is extensively evaluated on four popular datasets, outperforming the state-of-the-arts in all cases, with a fast processing speed of 23fps. Our code and results have been released to help ease future research in this direction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes