CVAILGApr 10, 2021

Latent Code-Based Fusion: A Volterra Neural Network Approach

arXiv:2104.04829v16 citations
Originality Highly original
AI Analysis

This work addresses multi-modal data fusion for clustering and classification tasks, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles multi-modal data fusion by proposing a Volterra Neural Network (VNN) auto-encoder that uses latent codes for self-representation embedding, achieving significant improvements in clustering performance and sample complexity over conventional CNN auto-encoders on two datasets.

We propose a deep structure encoder using the recently introduced Volterra Neural Networks (VNNs) to seek a latent representation of multi-modal data whose features are jointly captured by a union of subspaces. The so-called self-representation embedding of the latent codes leads to a simplified fusion which is driven by a similarly constructed decoding. The Volterra Filter architecture achieved reduction in parameter complexity is primarily due to controlled non-linearities being introduced by the higher-order convolutions in contrast to generalized activation functions. Experimental results on two different datasets have shown a significant improvement in the clustering performance for VNNs auto-encoder over conventional Convolutional Neural Networks (CNNs) auto-encoder. In addition, we also show that the proposed approach demonstrates a much-improved sample complexity over CNN-based auto-encoder with a superb robust classification performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes