LGCVMay 28, 2023

Cognitively Inspired Cross-Modal Data Generation Using Diffusion Models

arXiv:2305.18433v11 citations
Originality Incremental advance
AI Analysis

This work addresses cross-modal data generation for AI applications, but it appears incremental as it builds on existing diffusion models with a novel training scheme.

The paper tackled the problem of cross-modal generative methods suffering from information loss and unidirectional generation by proposing a multi-modal diffusion model training and sampling scheme inspired by human cognitive processes, achieving data generation conditioned on all correlated modalities.

Most existing cross-modal generative methods based on diffusion models use guidance to provide control over the latent space to enable conditional generation across different modalities. Such methods focus on providing guidance through separately-trained models, each for one modality. As a result, these methods suffer from cross-modal information loss and are limited to unidirectional conditional generation. Inspired by how humans synchronously acquire multi-modal information and learn the correlation between modalities, we explore a multi-modal diffusion model training and sampling scheme that uses channel-wise image conditioning to learn cross-modality correlation during the training phase to better mimic the learning process in the brain. Our empirical results demonstrate that our approach can achieve data generation conditioned on all correlated modalities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes