NOMAD Projection
This addresses a bottleneck in AI explainability for researchers and practitioners dealing with massive datasets, though it is incremental as it builds on existing nonlinear dimensionality reduction techniques.
The paper tackles the challenge of scaling unstructured data visualization for large datasets in AI explainability by introducing NOMAD Projection, a method that runs on multiple GPUs and shows superior performance and speed compared to existing methods, including computing the first complete data map of Multilingual Wikipedia.
The rapid adoption of generative AI has driven an explosion in the size of datasets consumed and produced by AI models. Traditional methods for unstructured data visualization, such as t-SNE and UMAP, have not kept up with the pace of dataset scaling. This presents a significant challenge for AI explainability, which relies on methods such as t-SNE and UMAP for exploratory data analysis. In this paper, we introduce Negative Or Mean Affinity Discrimination (NOMAD) Projection, the first method for unstructured data visualization via nonlinear dimensionality reduction that can run on multiple GPUs at train time. We provide theory that situates NOMAD Projection as an approximate upper bound on the InfoNC-t-SNE loss, and empirical results that demonstrate NOMAD Projection's superior performance and speed profile compared to existing state-of-the-art methods. We demonstrate the scalability of NOMAD Projection by computing the first complete data map of Multilingual Wikipedia.