LangLasso: Interactive Cluster Descriptions through LLM Explanation
This addresses the challenge of semantic interpretability in cluster analysis for non-experts, though it is incremental as it complements existing visual analytics methods.
The paper tackles the problem of interpreting clusters in dimensionality-reduced data by introducing LangLasso, a method that uses large language models to generate interactive, natural language descriptions, making cluster interpretation accessible to non-experts and integrating external knowledge.
Dimensionality reduction is a powerful technique for revealing structure and potential clusters in data. However, as the axes are complex, non-linear combinations of features, they often lack semantic interpretability. Existing visual analytics (VA) methods support cluster interpretation through feature comparison and interactive exploration, but they require technical expertise and intense human effort. We present \textit{LangLasso}, a novel method that complements VA approaches through interactive, natural language descriptions of clusters using large language models (LLMs). It produces human-readable descriptions that make cluster interpretation accessible to non-experts and allow integration of external contextual knowledge beyond the dataset. We systematically evaluate the reliability of these explanations and demonstrate that \langlasso provides an effective first step for engaging broader audiences in cluster interpretation. The tool is available at https://langlasso.vercel.app