IVCVFeb 7, 2022

Multi-modal data generation with a deep metric variational autoencoder

arXiv:2202.03434v13 citations
AI Analysis

This work addresses data augmentation for multi-modal medical datasets, but it is incremental as it applies an existing method to a new domain.

The paper tackled the problem of generating correlated multi-modal data, specifically otoscopy images and tympanometry measurements, by introducing a deep metric variational autoencoder with triplet loss, achieving promising results for conditional generation to enable efficient data augmentation.

We present a deep metric variational autoencoder for multi-modal data generation. The variational autoencoder employs triplet loss in the latent space, which allows for conditional data generation by sampling in the latent space within each class cluster. The approach is evaluated on a multi-modal dataset consisting of otoscopy images of the tympanic membrane with corresponding wideband tympanometry measurements. The modalities in this dataset are correlated, as they represent different aspects of the state of the middle ear, but they do not present a direct pixel-to-pixel correlation. The approach shows promising results for the conditional generation of pairs of images and tympanograms, and will allow for efficient data augmentation of data from multi-modal sources.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes