CLLGMLJan 12, 2023

Multimodal Deep Learning

arXiv:2301.04856v13500 citationsh-index: 128
Originality Synthesis-oriented
AI Analysis

It synthesizes existing knowledge for researchers and practitioners in AI, but is incremental as it reviews rather than introduces new methods.

The paper provides a comprehensive overview of multimodal deep learning, reviewing state-of-the-art approaches, modeling frameworks, and architectures for handling multiple modalities, with an application to generative art.

This book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other, as well as models in which one modality is utilized to enhance representation learning for the other. To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced. Finally, we also cover other modalities as well as general-purpose multi-modal models, which are able to handle different tasks on different modalities within one unified architecture. One interesting application (Generative Art) eventually caps off this booklet.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes