HCAILGMay 27

SmartIterator: Visual Analytics Workflows for Supervising Unsupervised Data Grouping

arXiv:2605.282199.8
Predicted impact top 89% in HC · last 90 daysOriginality Incremental advance
AI Analysis

For analysts using unsupervised clustering or topic modeling, SmartIterator provides actionable, method-specific guidance to navigate parameter spaces and interpret data structure evolution, addressing the problem of evaluating groupings without human oversight.

SmartIterator introduces a visual analytics approach with structured six-phase workflows for supervising unsupervised data grouping across parameter sweeps, demonstrated on three datasets including VAST Challenge 2011 social media messages where it validated against ground truth. The workflows enable analysts to build cumulative understanding of data structure beyond any single best result.

Unsupervised learning methods -- topic modeling, partition-based and density-based clustering -- produce data groupings without human guidance, yet choosing and evaluating those groupings should not itself be unsupervised. We present \emph{SmartIterator}~(SI), a visual analytics approach that treats the full sequence of grouping results across a parameter sweep as a first-class analytical object. For each method family, SI provides a structured six-phase workflow that guides the analyst through systematic exploration of grouping results -- from quality-metric overview through transition-stability assessment, membership-confidence evaluation, content and context inspection, and recurrent-archetype verification to an informed decision -- building cumulative understanding of data structure along the way. The workflows are operationalized through \emph{IteraScope}~(IS), a coordinated visual display combining quality-metric charts with semantic color encoding, a 1D group embedding with Sankey-style transition flows and violin plots of membership confidence, a 2D group embedding with HDBSCAN-detected recurrent archetypes that highlights iterations capturing all persistent patterns, and domain-specific linked views for contextualized interpretation. We demonstrate the three workflows on: (1)~simulated social-media messages from the VAST Challenge 2011 (density-based clustering, validated against ground truth), (2)~EU population statistics across ${\sim}1\,500$ NUTS-3 regions (partition-based clustering), and (3)~30 years of IEEE VIS papers (NMF topic modeling). The workflows constitute the main contribution: they provide actionable, method-specific guidance for navigating parameter spaces, studying how data structure evolves across configurations, and grounding analytical understanding in domain context -- yielding knowledge about the data that no single ``best'' result can provide.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes