CVAISep 15, 2024

Towards Kinetic Manipulation of the Latent Space

arXiv:2409.09867v2Has Code
AI Analysis

This addresses the limitation of GUI-based tools for latent space exploration, offering a novel interaction method for researchers and practitioners in generative AI.

The paper tackles the problem of exploring the latent space of generative models by introducing Visual-reactive Interpolation, which uses a live RGB camera feed and pre-trained CNNs to manipulate the latent space based on scene changes, showing it performs well with potential for improvement.

The latent space of many generative models are rich in unexplored valleys and mountains. The majority of tools used for exploring them are so far limited to Graphical User Interfaces (GUIs). While specialized hardware can be used for this task, we show that a simple feature extraction of pre-trained Convolutional Neural Networks (CNNs) from a live RGB camera feed does a very good job at manipulating the latent space with simple changes in the scene, with vast room for improvement. We name this new paradigm Visual-reactive Interpolation, and the full code can be found at https://github.com/PDillis/stylegan3-fun.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes