CVAINov 19, 2023

Appearance Codes using Joint Embedding Learning of Multiple Modalities

arXiv:2311.11427v1h-index: 8Has Code
Originality Incremental advance
AI Analysis

This addresses a limitation in generative modeling for scene rendering, offering a more efficient approach for applications like virtual reality or autonomous driving, though it appears incremental as it builds on existing appearance code techniques.

The paper tackles the problem of needing to retrain appearance codes for each scene in generative modeling by proposing a joint embedding framework that learns appearance and structure from multiple modalities, enabling night-time renders from day-time codes without additional optimization. It demonstrates similar generation quality to a baseline without learning new codes for unseen images.

The use of appearance codes in recent work on generative modeling has enabled novel view renders with variable appearance and illumination, such as day-time and night-time renders of a scene. A major limitation of this technique is the need to re-train new appearance codes for every scene on inference, so in this work we address this problem proposing a framework that learns a joint embedding space for the appearance and structure of the scene by enforcing a contrastive loss constraint between different modalities. We apply our framework to a simple Variational Auto-Encoder model on the RADIATE dataset \cite{sheeny2021radiate} and qualitatively demonstrate that we can generate new renders of night-time photos using day-time appearance codes without additional optimization iterations. Additionally, we compare our model to a baseline VAE that uses the standard per-image appearance code technique and show that our approach achieves generations of similar quality without learning appearance codes for any unseen images on inference.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes