Compositional Hierarchical Tensor Factorization: Representing Hierarchical Intrinsic and Extrinsic Causal Factors
This work addresses the challenge of robust object recognition in data-starved domains by providing an interpretable representation, though it is incremental as it builds on existing tensor decomposition methods.
The paper tackled the problem of representing visual objects by disentangling hierarchical intrinsic and extrinsic causal factors, proposing a compositional hierarchical tensor factorization that improves object recognition robustness to occlusion and reduces training data requirements, achieving encouraging face verification results on the Freiburg and LFW datasets with training on a reduced synthetic dataset.
Visual objects are composed of a recursive hierarchy of perceptual wholes and parts, whose properties, such as shape, reflectance, and color, constitute a hierarchy of intrinsic causal factors of object appearance. However, object appearance is the compositional consequence of both an object's intrinsic and extrinsic causal factors, where the extrinsic causal factors are related to illumination, and imaging conditions. Therefore, this paper proposes a unified tensor model of wholes and parts, and introduces a compositional hierarchical tensor factorization that disentangles the hierarchical causal structure of object image formation, and subsumes multilinear block tensor decomposition as a special case. The resulting object representation is an interpretable combinatorial choice of wholes' and parts' representations that renders object recognition robust to occlusion and reduces training data requirements. We demonstrate ourapproach in the context of face recognition by training on an extremely reduced dataset of synthetic images, and report encouragingface verification results on two datasets - the Freiburg dataset, andthe Labeled Face in the Wild (LFW) dataset consisting of real world images, thus, substantiating the suitability of our approach for data starved domains.