CVFeb 16

Image Generation with a Sphere Encoder

Kaiyu Yue, Menglin Jia, Ji Hou, Tom Goldstein

arXiv:2602.15030v16.04 citationsh-index: 11

Originality Highly original

AI Analysis

This addresses the computational bottleneck in image generation for AI applications, offering a more efficient alternative to diffusion models.

The paper tackles the problem of slow inference in diffusion-based image generation by introducing the Sphere Encoder, which maps images to a spherical latent space and generates images in a single forward pass. The method achieves performance competitive with state-of-the-art diffusion models while using fewer than five steps, significantly reducing inference cost.

We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .

View on arXiv PDF

Similar