LG CVMay 28, 2023

Cognitively Inspired Cross-Modal Data Generation Using Diffusion Models

arXiv:2305.18433v13.81 citations

Originality Incremental advance

AI Analysis

This work addresses cross-modal data generation for AI applications, but it appears incremental as it builds on existing diffusion models with a novel training scheme.

The paper tackled the problem of cross-modal generative methods suffering from information loss and unidirectional generation by proposing a multi-modal diffusion model training and sampling scheme inspired by human cognitive processes, achieving data generation conditioned on all correlated modalities.

Most existing cross-modal generative methods based on diffusion models use guidance to provide control over the latent space to enable conditional generation across different modalities. Such methods focus on providing guidance through separately-trained models, each for one modality. As a result, these methods suffer from cross-modal information loss and are limited to unidirectional conditional generation. Inspired by how humans synchronously acquire multi-modal information and learn the correlation between modalities, we explore a multi-modal diffusion model training and sampling scheme that uses channel-wise image conditioning to learn cross-modality correlation during the training phase to better mimic the learning process in the brain. Our empirical results demonstrate that our approach can achieve data generation conditioned on all correlated modalities.

View on arXiv PDF

Similar