CVMay 6, 2021

Multi-Perspective LSTM for Joint Visual Representation Learning

arXiv:2105.02802v110 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses multi-perspective visual recognition problems for applications like lip reading and face recognition, representing an incremental improvement over existing LSTM architectures.

The authors tackled the problem of learning visual representations from sequences captured from multiple perspectives by proposing a novel LSTM cell architecture with additional gates and memories, achieving superior performance in lip reading and face recognition tasks compared to benchmarks in terms of accuracy and complexity.

We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives. Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level. We demonstrate that by using the proposed cell to create a network, more effective and richer visual representations are learned for recognition tasks. We validate the performance of our proposed architecture in the context of two multi-perspective visual recognition tasks namely lip reading and face recognition. Three relevant datasets are considered and the results are compared against fusion strategies, other existing multi-input LSTM architectures, and alternative recognition solutions. The experiments show the superior performance of our solution over the considered benchmarks, both in terms of recognition accuracy and complexity. We make our code publicly available at https://github.com/arsm/MPLSTM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes