CV AIJun 6, 2024

Nomic Embed Vision: Expanding the Latent Space

Zach Nussbaum, Brandon Duderstadt, Andriy Mulyar

arXiv:2406.18587v115.821 citations

Originality Incremental advance

AI Analysis

This work addresses the need for a unified representation space in AI, enabling better integration of vision and language models for multimodal applications.

The authors tackled the problem of creating a unified latent space for vision, language, and multimodal tasks by training nomic-embed-vision, an open image embedding model that shares the same latent space as nomic-embed-text, achieving high performance across these tasks.

This technical report describes the training of nomic-embed-vision, a highly performant, open-code, open-weights image embedding model that shares the same latent space as nomic-embed-text. Together, nomic-embed-vision and nomic-embed-text form the first unified latent space to achieve high performance across vision, language, and multimodal tasks.

View on arXiv PDF

Similar