Creating User-steerable Projections with Interactive Semantic Mapping
This work addresses the need for more interpretable and user-steerable data visualization tools in fields like image and text analysis, though it is incremental as it builds on existing DR and MLLM methods.
The paper tackles the problem of dimensionality reduction techniques lacking the ability to explore semantic structures not explicitly in the data, by introducing a user-guided projection framework using MLLMs that allows customizable visualizations via natural-language prompts, resulting in enhanced cluster separation and interactive data exploration.
Dimensionality reduction (DR) techniques map high-dimensional data into lower-dimensional spaces. Yet, current DR techniques are not designed to explore semantic structure that is not directly available in the form of variables or class labels. We introduce a novel user-guided projection framework for image and text data that enables customizable, interpretable, data visualizations via zero-shot classification with Multimodal Large Language Models (MLLMs). We enable users to steer projections dynamically via natural-language guiding prompts, to specify high-level semantic relationships of interest to the users which are not explicitly present in the data dimensions. We evaluate our method across several datasets and show that it not only enhances cluster separation, but also transforms DR into an interactive, user-driven process. Our approach bridges the gap between fully automated DR techniques and human-centered data exploration, offering a flexible and adaptive way to tailor projections to specific analytical needs.