Stephen Kobourov

h-index42

31papers

604citations

Novelty34%

AI Score54

Ranked #9,512 of 194,257 authors (top 5%)#21 in HC (top 1%)

31 Papers

8.1HCAug 31, 2023Code

Balancing between the Local and Global Structures (LGS) in Graph Embedding

Jacob Miller, Vahan Huroyan, Stephen Kobourov

We present a method for balancing between the Local and Global Structures (LGS) in graph embedding, via a tunable parameter. Some embedding methods aim to capture global structures, while others attempt to preserve local neighborhoods. Few methods attempt to do both, and it is not always possible to capture well both local and global information in two dimensions, which is where most graph drawing live. The choice of using a local or a global embedding for visualization depends not only on the task but also on the structure of the underlying data, which may not be known in advance. For a given graph, LGS aims to find a good balance between the local and global structure to preserve. We evaluate the performance of LGS with synthetic and real-world datasets and our results indicate that it is competitive with the state-of-the-art methods, using established quality metrics such as stress and neighborhood preservation. We introduce a novel quality metric, cluster distance preservation, to assess intermediate structure capture. All source-code, datasets, experiments and analysis are available online.

7.6CGMay 30

Representing Hypergraphs by Point-Line Incidences

Alexander Dobler, Stephen Kobourov, Debajyoti Mondal et al.

We consider hypergraph visualizations that represent vertices as points in the plane and hyperedges as curves passing through the points of their incident vertices. Specifically, we consider several different variants of this problem by (a) restricting the curves to be lines or line segments, (b) allowing two curves to cross if they do not share an element, or not; and (c) allowing two curves to overlap or not. We show $\exists\mathbb{R}$-hardness for six of the eight resulting decision problem variants and describe polynomial-time algorithms in some restricted settings. Lastly, we briefly touch on what happens if we allow the lines of the represented hyperedges to have bends - to this we generalize a counterexample to a long-standing result that was sometimes assumed to be correct.

2.0LGNov 17, 2023Code

Graph Sparsifications using Neural Network Assisted Monte Carlo Tree Search

Alvin Chiu, Mithun Ghosh, Reyan Ahmed et al.

Graph neural networks have been successful for machine learning, as well as for combinatorial and graph problems such as the Subgraph Isomorphism Problem and the Traveling Salesman Problem. We describe an approach for computing graph sparsifiers by combining a graph neural network and Monte Carlo Tree Search. We first train a graph neural network that takes as input a partial solution and proposes a new node to be added as output. This neural network is then used in a Monte Carlo search to compute a sparsifier. The proposed method consistently outperforms several standard approximation algorithms on different types of graphs and often finds the optimal solution.

2.0LGApr 30, 2023Code

Nearly Optimal Steiner Trees using Graph Neural Network Assisted Monte Carlo Tree Search

Reyan Ahmed, Mithun Ghosh, Kwang-Sung Jun et al.

Graph neural networks are useful for learning problems, as well as for combinatorial and graph problems such as the Subgraph Isomorphism Problem and the Traveling Salesman Problem. We describe an approach for computing Steiner Trees by combining a graph neural network and Monte Carlo Tree Search. We first train a graph neural network that takes as input a partial solution and proposes a new node to be added as output. This neural network is then used in a Monte Carlo search to compute a Steiner tree. The proposed method consistently outperforms the standard 2-approximation algorithm on many different types of graphs and often finds the optimal solution.

7.9HCApr 9

Exploring MLLMs Perception of Network Visualization Principles

Jacob Miller, Markus Wallinger, Ludwig Felder et al.

In this paper, we test whether Multimodal Large Language Models (MLLMs) can match human-subject performance in tasks involving the perception of properties in network layouts. Specifically, we replicate a human-subject experiment about perceiving quality (namely stress) in network layouts using GPT-4o, Gemini-2.5 and Qwen2.5. Our experiments show that giving MLLMs the same study information as trained human participants yields performance comparable to that of human experts and exceeds that of untrained non-experts. Additionally, we show that prompt engineering that deviates from the human-subject experiment can lead to better-than-human performance in some settings. Interestingly, like human subjects, the MLLMs seem to rely on visual proxies rather than computing the actual value of stress, indicating some sense or facsimile of perception. Explanations from the models are similar to those used by the human participants (e.g., an even distribution of nodes and uniform edge lengths).

5.8LGMay 24, 2022Code

ENS-t-SNE: Embedding Neighborhoods Simultaneously t-SNE

Jacob Miller, Vahan Huroyan, Raymundo Navarrete et al.

When visualizing a high-dimensional dataset, dimension reduction techniques are commonly employed which provide a single 2-dimensional view of the data. We describe ENS-t-SNE: an algorithm for Embedding Neighborhoods Simultaneously that generalizes the t-Stochastic Neighborhood Embedding approach. By using different viewpoints in ENS-t-SNE's 3D embedding, one can visualize different types of clusters within the same high-dimensional dataset. This enables the viewer to see and keep track of the different types of clusters, which is harder to do when providing multiple 2D embeddings, where corresponding points cannot be easily identified. We illustrate the utility of ENS-t-SNE with real-world applications and provide an extensive quantitative evaluation with datasets of different types and sizes.

6.4LGAug 14, 2024Code

"Normalized Stress" is Not Normalized: How to Interpret Stress Correctly

Kiran Smelser, Jacob Miller, Stephen Kobourov

Stress is among the most commonly employed quality metrics and optimization criteria for dimension reduction projections of high dimensional data. Complex, high dimensional data is ubiquitous across many scientific disciplines, including machine learning, biology, and the social sciences. One of the primary methods of visualizing these datasets is with two dimensional scatter plots that visually capture some properties of the data. Because visually determining the accuracy of these plots is challenging, researchers often use quality metrics to measure projection accuracy or faithfulness to the full data. One of the most commonly employed metrics, normalized stress, is sensitive to uniform scaling of the projection, despite this act not meaningfully changing anything about the projection. We investigate the effect of scaling on stress and other distance based quality metrics analytically and empirically by showing just how much the values change and how this affects dimension reduction technique evaluations. We introduce a simple technique to make normalized stress scale invariant and show that it accurately captures expected behavior on a small benchmark.

1.2GRNov 5, 2025

Visualization Biases MLLM's Decision Making in Network Data Tasks

Timo Brand, Henry Förster, Stephen G. Kobourov et al.

We evaluate how visualizations can influence the judgment of MLLMs about the presence or absence of bridges in a network. We show that the inclusion of visualization improves confidence over a structured text-based input that could theoretically be helpful for answering the question. On the other hand, we observe that standard visualization techniques create a strong bias towards accepting or refuting the presence of a bridge -- independently of whether or not a bridge actually exists in the network. While our results indicate that the inclusion of visualization techniques can effectively influence the MLLM's judgment without compromising its self-reported confidence, they also imply that practitioners must be careful of allowing users to include visualizations in generative AI applications so as to avoid undesired hallucinations.

4.1LGOct 9, 2025

How Scale Breaks "Normalized Stress" and KL Divergence: Rethinking Quality Metrics

Kiran Smelser, Kaviru Gunaratne, Jacob Miller et al.

Complex, high-dimensional data is ubiquitous across many scientific disciplines, including machine learning, biology, and the social sciences. One of the primary methods of visualizing these datasets is with two-dimensional scatter plots that visually capture some properties of the data. Because visually determining the accuracy of these plots is challenging, researchers often use quality metrics to measure the projection's accuracy and faithfulness to the original data. One of the most commonly employed metrics, normalized stress, is sensitive to uniform scaling (stretching, shrinking) of the projection, despite this act not meaningfully changing anything about the projection. Another quality metric, the Kullback--Leibler (KL) divergence used in the popular t-Distributed Stochastic Neighbor Embedding (t-SNE) technique, is also susceptible to this scale sensitivity. We investigate the effect of scaling on stress and KL divergence analytically and empirically by showing just how much the values change and how this affects dimension reduction technique evaluations. We introduce a simple technique to make both metrics scale-invariant and show that it accurately captures expected behavior on a small benchmark.

1.2CGSep 7, 2025

Using Reinforcement Learning to Optimize the Global and Local Crossing Number

Timo Brand, Henry Förster, Stephen Kobourov et al.

We present a novel approach to graph drawing based on reinforcement learning for minimizing the global and the local crossing number, that is, the total number of edge crossings and the maximum number of crossings on any edge, respectively. In our framework, an agent learns how to move a vertex based on a given observation vector in order to optimize its position. The agent receives feedback in the form of local reward signals tied to crossing reduction. To generate an initial layout, we use a stress-based graph-drawing algorithm. We compare our method against force- and stress-based (baseline) algorithms as well as three established algorithms for global crossing minimization on a suite of benchmark graphs. The experiments show mixed results: our current algorithm is mainly competitive for the local crossing number. We see a potential for further development of the approach in the future.

6.5LGAug 18, 2021

Computing Steiner Trees using Graph Neural Networks

Reyan Ahmed, Md Asadullah Turja, Faryad Darabi Sahneh et al.

Graph neural networks have been successful in many learning problems and real-world applications. A recent line of research explores the power of graph neural networks to solve combinatorial and graph algorithmic problems such as subgraph isomorphism, detecting cliques, and the traveling salesman problem. However, many NP-complete problems are as of yet unexplored using this method. In this paper, we tackle the Steiner Tree Problem. We employ four learning frameworks to compute low cost Steiner trees: feed-forward neural networks, graph neural networks, graph convolutional networks, and a graph attention model. We use these frameworks in two fundamentally different ways: 1) to train the models to learn the actual Steiner tree nodes, 2) to train the model to learn good Steiner point candidates to be connected to the constructed tree using a shortest path in a greedy fashion. We illustrate the robustness of our heuristics on several random graph generation models as well as the SteinLib data library. Our finding suggests that the out-of-the-box application of GNN methods does worse than the classic 2-approximation method. However, when combined with a greedy shortest path construction, it even does slightly better than the 2-approximation algorithm. This result sheds light on the fundamental capabilities and limitations of graph learning techniques on classical NP-complete problems.

8.6HCJan 20, 2021Code

On the Readability of Abstract Set Visualizations

Markus Wallinger, Ben Jacobsen, Stephen Kobourov et al.

Set systems are used to model data that naturally arises in many contexts: social networks have communities, musicians have genres, and patients have symptoms. Visualizations that accurately reflect the information in the underlying set system make it possible to identify the set elements, the sets themselves, and the relationships between the sets. In static contexts, such as print media or infographics, it is necessary to capture this information without the help of interactions. With this in mind, we consider three different systems for medium-sized set data, LineSets, EulerView, and MetroSets, and report the results of a controlled human-subjects experiment comparing their effectiveness. Specifically, we evaluate the performance, in terms of time and error, on tasks that cover the spectrum of static set-based tasks. We also collect and analyze qualitative data about the three different visualization systems. Our results include statistically significant differences, suggesting that MetroSets performs and scales better.

0.2CLOct 15, 2020

The Language of Food during the Pandemic: Hints about the Dietary Effects of Covid-19

Hoang Van, Ahmad Musa, Mihai Surdeanu et al.

We study the language of food on Twitter during the pandemic lockdown in the United States, focusing on the two month period of March 15 to May 15, 2020. Specifically, we analyze over770,000 tweets published during the lockdown and the equivalent period in the five previous years and highlight several worrying trends. First, we observe that during the lockdown there was a notable shift from mentions of healthy foods to unhealthy foods. Second, we show an increased pointwise mutual information of depression hashtags with food-related tweets posted during the lockdown and an increased association between depression hashtags and unhealthy foods, tobacco, and alcohol during the lockdown.

6.6GRAug 21, 2020

MetroSets: Visualizing Sets as Metro Maps

Ben Jacobsen, Markus Wallinger, Stephen Kobourov et al.

We propose MetroSets, a new, flexible online tool for visualizing set systems using the metro map metaphor. We model a given set system as a hypergraph $\mathcal{H} = (V, \mathcal{S})$, consisting of a set $V$ of vertices and a set $\mathcal{S}$, which contains subsets of $V$ called hyperedges. Our system then computes a metro map representation of $\mathcal{H}$, where each hyperedge $E$ in $\mathcal{S}$ corresponds to a metro line and each vertex corresponds to a metro station. Vertices that appear in two or more hyperedges are drawn as interchanges in the metro map, connecting the different sets. MetroSets is based on a modular 4-step pipeline which constructs and optimizes a path-based hypergraph support, which is then drawn and schematized using metro map layout algorithms. We propose and implement multiple algorithms for each step of the MetroSet pipeline and provide a functional prototype with easy-to-use preset configurations. Furthermore, using several real-world datasets, we perform an extensive quantitative evaluation of the impact of different pipeline stages on desirable properties of the generated maps, such as octolinearity, monotonicity, and edge uniformity.

5.9DSSep 13, 2019Code

Multi-Perspective, Simultaneous Embedding

Md Iqbal Hossain, Vahan Huroyan, Stephen Kobourov et al.

We describe MPSE: a Multi-Perspective Simultaneous Embedding method for visualizing high-dimensional data, based on multiple pairwise distances between the data points. Specifically, MPSE computes positions for the points in 3D and provides different views into the data by means of 2D projections (planes) that preserve each of the given distance matrices. We consider two versions of the problem: fixed projections and variable projections. MPSE with fixed projections takes as input a set of pairwise distance matrices defined on the data points, along with the same number of projections and embeds the points in 3D so that the pairwise distances are preserved in the given projections. MPSE with variable projections takes as input a set of pairwise distance matrices and embeds the points in 3D while also computing the appropriate projections that preserve the pairwise distances. The proposed approach can be useful in multiple scenarios: from creating simultaneous embedding of multiple graphs on the same set of vertices, to reconstructing a 3D object from multiple 2D snapshots, to analyzing data from multiple points of view. We provide a functional prototype of MPSE that is based on an adaptive and stochastic generalization of multi-dimensional scaling to multiple distances and multiple variable projections. We provide an extensive quantitative evaluation with datasets of different sizes and using different number of projections, as well as several examples that illustrate the quality of the resulting solutions.

4.3CGAug 4, 2019Code

Stress-Plus-X (SPX) Graph Layout

Sabin Devkota, Reyan Ahmed, Felice De Luca et al.

Stress, edge crossings, and crossing angles play an important role in the quality and readability of graph drawings. Most standard graph drawing algorithms optimize one of these criteria which may lead to layouts that are deficient in other criteria. We introduce an optimization framework, Stress-Plus-X (SPX), that simultaneously optimizes stress together with several other criteria: edge crossings, minimum crossing angle, and upwardness (for directed acyclic graphs). SPX achieves results that are close to the state-of-the-art algorithms that optimize these metrics individually. SPX is flexible and extensible and can optimize a subset or all of these criteria simultaneously. Our experimental analysis shows that our joint optimization approach is successful in drawing graphs with good performance across readability criteria.

3.4CVJul 1, 2019Code

Symmetry Detection and Classification in Drawings of Graphs

Felice De Luca, Md Iqbal Hossain, Stephen Kobourov

Symmetry is a key feature observed in nature (from flowers and leaves, to butterflies and birds) and in human-made objects (from paintings and sculptures, to manufactured objects and architectural design). Rotational, translational, and especially reflectional symmetries, are also important in drawings of graphs. Detecting and classifying symmetries can be very useful in algorithms that aim to create symmetric graph drawings and in this paper we present a machine learning approach for these tasks. Specifically, we show that deep neural networks can be used to detect reflectional symmetries with 92% accuracy. We also build a multi-class classifier to distinguish between reflectional horizontal, reflectional vertical, rotational, and translational symmetries. Finally, we make available a collection of images of graph drawings with specific symmetric features that can be used in machine learning systems for training, testing and validation purposes. Our datasets, best trained ML models, source code are available online.

2.3CGJun 14, 2019Code

Multi-level tree based approach for interactive graph visualization with semantic zoom

Felice De Luca, Iqbal Hossain, Kathryn Gray et al.

Human subject studies that map-like visualizations are as good or better than standard node-link representations of graphs, in terms of task performance, memorization and recall of the underlying data, and engagement [SSKB14, SSKB15]. With this in mind, we propose the Zoomable Multi-Level Tree (ZMLT) algorithm for multi-level tree-based, map-like visualization of large graphs. We propose seven desirable properties that such visualization should maintain and an algorithm that accomplishes them. (1) The abstract trees represent the underlying graph appropriately at different level of details; (2) The embedded trees represent the underlying graph appropriately at different levels of details; (3) At every level of detail we show real vertices and real paths from the underlying graph; (4) If any node or edge appears in a given level, then they also appear in all deeper levels; (5) All nodes at the current level and higher levels are labeled and there are no label overlaps; (6) There are no edge crossings on any level; (7) The drawing area is proportional to the total area of the labels. This algorithm is implemented and we have a functional prototype for the interactive interface in a web browser.

7.3SISep 1, 2017

Drawing Dynamic Graphs Without Timeslices

Paolo Simonetto, Daniel Archambault, Stephen Kobourov

Timeslices are often used to draw and visualize dynamic graphs. While timeslices are a natural way to think about dynamic graphs, they are routinely imposed on continuous data. Often, it is unclear how many timeslices to select: too few timeslices can miss temporal features such as causality or even graph structure while too many timeslices slows the drawing computation. We present a model for dynamic graphs which is not based on timeslices, and a dynamic graph drawing algorithm, DynNoSlice, to draw graphs in this model. In our evaluation, we demonstrate the advantages of this approach over timeslicing on continuous data sets.

7.7HCAug 31, 2017

Revisited Experimental Comparison of Node-Link and Matrix Representations

Mershack Okoe, Radu Jianu, Stephen Kobourov

Visualizing network data is applicable in domains such as biology, engineering, and social sciences. We report the results of a study comparing the effectiveness of the two primary techniques for showing network data: node-link diagrams and adjacency matrices. Specifically, an evaluation with a large number of online participants revealed statistically significant differences between the two visualizations. Our work adds to existing research in several ways. First, we explore a broad spectrum of network tasks, many of which had not been previously evaluated. Second, our study uses a large dataset, typical of many real-life networks not explored by previous studies. Third, we leverage crowdsourcing to evaluate many tasks with many participants.

2.2IRJun 15, 2017

Research Topics Map: rtopmap

Md Iqbal Hossain, Stephen Kobourov

In this paper we describe a system for visualizing and analyzing worldwide research topics, {\tt rtopmap}. We gather data from google scholar academic research profiles, putting together a weighted topics graph, consisting of over 35,000 nodes and 646,000 edges. The nodes correspond to self-reported research topics, and edges correspond to co-occurring topics in google scholar profiles. The {\tt rtopmap} system supports zooming/panning/searching and other google-maps-based interactive features. With the help of map overlays, we also visualize the strengths and weaknesses of different academic institutions in terms of human resources (e.g., number of researchers in different areas), as well as scholarly output (e.g., citation counts in different areas). Finally, we also visualize what parts of the map are associated with different academic departments, or with specific documents (such as research papers, or calls for proposals). The system itself is available at \url{http://rtopmap.arl.arizona.edu/}.

17.1HCMay 27, 2016

The State of the Art in Cartograms

Sabrina Nusrat, Stephen Kobourov

Cartograms combine statistical and geographical information in thematic maps, where areas of geographical regions (e.g., countries, states) are scaled in proportion to some statistic (e.g., population, income). Cartograms make it possible to gain insight into patterns and trends in the world around us and have been very popular visualizations for geo-referenced data for over a century. This work surveys cartogram research in visualization, cartography and geometry, covering a broad spectrum of different cartogram types: from the traditional rectangular and table cartograms, to Dorling and diffusion cartograms. A particular focus is the study of the major cartogram dimensions: statistical accuracy, geographical accuracy, and topological accuracy. We review the history of cartograms, describe the algorithms for generating them, and consider task taxonomies. We also review quantitative and qualitative evaluations, and we use these to arrive at design guidelines and research challenges.

11.3CLMar 11, 2016

Towards using social media to identify individuals at risk for preventable chronic illness

Dane Bell, Daniel Fried, Luwen Huangfu et al.

We describe a strategy for the acquisition of training data necessary to build a social-media-driven early detection system for individuals at risk for (preventable) type 2 diabetes mellitus (T2DM). The strategy uses a game-like quiz with data and questions acquired semi-automatically from Twitter. The questions are designed to inspire participant engagement and collect relevant data to train a public-health model applied to individuals. Prior systems designed to use social media such as Twitter to predict obesity (a risk factor for T2DM) operate on entire communities such as states, counties, or cities, based on statistics gathered by government agencies. Because there is considerable variation among individuals within these groups, training data on the individual level would be more effective, but this data is difficult to acquire. The approach proposed here aims to address this issue. Our strategy has two steps. First, we trained a random forest classifier on data gathered from (public) Twitter statuses and state-level statistics with state-of-the-art accuracy. We then converted this classifier into a 20-questions-style quiz and made it available online. In doing so, we achieved high engagement with individuals that took the quiz, while also building a training set of voluntarily supplied individual-level data for future classification.

18.8HCApr 9, 2015

Evaluating Cartogram Effectiveness

Sabrina Nusrat, Md. Jawaherul Alam, Stephen G. Kobourov

Cartograms are maps in which areas of geographic regions (countries, states) appear in proportion to some variable of interest (population, income). Cartograms are popular visualizations for geo-referenced data that have been used for over a century and that make it possible to gain insight into patterns and trends in the world around us. Despite the popularity of cartograms and the large number of cartogram types, there are few studies evaluating the effectiveness of cartograms in conveying information. Based on a recent task taxonomy for cartograms, we evaluate four major different types of cartograms: contiguous, non-contiguous, rectangular, and Dorling cartograms. Specifically, we evaluate the effectiveness of these cartograms by quantitative performance analysis, as well as by subjective preferences. We analyze the results of our study in the context of some prevailing assumptions in the literature of cartography and cognitive science. Finally, we make recommendations for the use of different types of cartograms for different tasks and settings.

3.3HCMar 2, 2015

Towards Understanding Enjoyment and Flow in Information Visualization

Bahador Saket, Carlos Scheidegger, Stephen Kobourov

Traditionally, evaluation studies in information visualization have measured effectiveness by assessing performance time and accuracy. More recently, there has been a concerted effort to understand aspects beyond time and errors. In this paper we study enjoyment, which, while arguably not the primary goal of visualization, has been shown to impact performance and memorability. Different models of enjoyment have been proposed in psychology, education and gaming; yet there is no standard approach to evaluate and measure enjoyment in visualization. In this paper we relate the flow model of Csikszentmihalyi to Munzner's nested model of visualization evaluation and previous work in the area. We suggest that, even though previous papers tackled individual elements of flow, in order to understand what specifically makes a visualization enjoyable, it might be necessary to measure all specific elements.

5.8HCFeb 26, 2015

Visualizing Cartograms: Goals and Task Taxonomy

Sabrina Nusrat, Stephen Kobourov

Cartograms are maps in which areas of geographic regions (countries, states) appear in proportion to some variable of interest (population, income). Cartograms are popular visualizations for geo-referenced data that have been around for over a century. Newspapers, magazines, textbooks, blogs, and presentations frequently employ cartograms to show voting results, popularity, and in general, geographic patterns. Despite the popularity of cartograms and the large number of cartogram variants, there are very few studies evaluating the effectiveness of cartograms in conveying information. In order to design cartograms as a useful visualization tool and to be able to compare the effectiveness of cartograms generated by different methods, we need to study the nature of information conveyed and the specific tasks that can be performed on cartograms. In this paper we consider a set of cartogram visualization tasks, based on standard taxonomies from cartography and information visualization. We then propose a cartogram task taxonomy that can be used to organize not only the tasks considered here but also other tasks that might be added later.

9.7CLSep 8, 2014

Analyzing the Language of Food on Social Media

Daniel Fried, Mihai Surdeanu, Stephen Kobourov et al.

We investigate the predictive power behind the language of food on social media. We collect a corpus of over three million food-related posts from Twitter and demonstrate that many latent population characteristics can be directly predicted from this data: overweight rate, diabetes rate, political leaning, and home geographical location of authors. For all tasks, our language-based models significantly outperform the majority-class baselines. Performance is further improved with more complex natural language processing, such as topic modeling. We analyze which textual features have most predictive power for these datasets, providing insight into the connections between the language of food, geographic locale, and community characteristics. Lastly, we design and implement an online system for real-time query and visualization of the dataset. Visualization tools, such as geo-referenced heatmaps, semantics-preserving wordclouds and temporal histograms, allow us to discover more complex, global patterns mirrored in the language of food.

21.7HCApr 7, 2014

Node, Node-Link, and Node-Link-Group Diagrams: An Evaluation

Bahador Saket, Paolo Simonetto, Stephen Kobourov et al.

Effectively showing the relationships between objects in a dataset is one of the main tasks in information visualization. Typically there is a well-defined notion of distance between pairs of objects, and traditional approaches such as principal component analysis or multi-dimensional scaling are used to place the objects as points in 2D space, so that similar objects are close to each other. In another typical setting, the dataset is visualized as a network graph, where related nodes are connected by links. More recently, datasets are also visualized as maps, where in addition to nodes and links, there is an explicit representation of groups and clusters. We consider these three Techniques, characterized by a progressive increase of the amount of encoded information: node diagrams, node-link diagrams and node-link-group diagrams. We assess these three types of diagrams with a controlled experiment that covers nine different tasks falling broadly in three categories: node-based tasks, network-based tasks and group-based tasks. Our findings indicate that adding links, or links and group representations, does not negatively impact performance (time and accuracy) of node-based tasks. Similarly, adding group representations does not negatively impact the performance of network-based tasks. Node-link-group diagrams outperform the others on group-based tasks. These conclusions contradict results in other studies, in similar but subtly different settings. Taken together, however, such results can have significant implications for the design of standard and domain specific visualizations tools.

17.3HCMar 21, 2014

Group-Level Graph Visualization Taxonomy

Bahador Saket, Paolo Simonetto, Stephen Kobourov

Task taxonomies for graph and network visualizations focus on tasks commonly encountered when analyzing graph connectivity and topology. However, in many application fields such as the social sciences (social networks), biology (protein interaction models), software engineering (program call graphs), connectivity and topology information is intertwined with group, clustering, and hierarchical information. Several recent visualization techniques, such as BubbleSets, LineSets and GMap, make explicit use of grouping and clustering, but evaluating such visualization has been difficult due to the lack of standardized group-level tasks. With this in mind, our goal is to define a new set of tasks that assess group-level comprehension. We propose several types of group-level tasks and provide several examples of each type. Finally, we characterize some of the proposed tasks using the multi-level typology of abstract visualization tasks. We believe that adding group-level tasks to the task taxonomy for graph visualization would make the taxonomy more useful for the recent graph visualization techniques. It would help evaluators define and categorize new tasks, and it would help generalize individual results collected in controlled experiments.

1.2DSApr 23, 2013

On Semantic Word Cloud Representation

Lukas Barth, Stephen Kobourov, Sergey Pupyrev et al.

We study the problem of computing semantic-preserving word clouds in which semantically related words are close to each other. While several heuristic approaches have been described in the literature, we formalize the underlying geometric algorithm problem: Word Rectangle Adjacency Contact (WRAC). In this model each word is associated with rectangle with fixed dimensions, and the goal is to represent semantically related words by ensuring that the two corresponding rectangles touch. We design and analyze efficient polynomial-time algorithms for some variants of the WRAC problem, show that several general variants are NP-hard, and describe a number of approximation algorithms. Finally, we experimentally demonstrate that our theoretically-sound algorithms outperform the early heuristics.

13.8IRApr 9, 2013

Maps of Computer Science

Daniel Fried, Stephen G. Kobourov

We describe a practical approach for visual exploration of research papers. Specifically, we use the titles of papers from the DBLP database to create what we call maps of computer science (MoCS). Words and phrases from the paper titles are the cities in the map, and countries are created based on word and phrase similarity, calculated using co-occurrence. With the help of heatmaps, we can visualize the profile of a particular conference or journal over the base map. Similarly, heatmap profiles can be made of individual researchers or groups such as a department. The visualization system also makes it possible to change the data used to generate the base map. For example, a specific journal or conference can be used to generate the base map and then the heatmap overlays can be used to show the evolution of research topics in the field over the years. As before, individual researchers or research groups profiles can be visualized using heatmap overlays but this time over the journal or conference base map. Finally, research papers or abstracts easily generate visual abstracts giving a visual representation of the distribution of topics in the paper. We outline a modular and extensible system for term extraction using natural language processing techniques, and show the applicability of methods of information retrieval to calculation of term similarity and creation of a topic map. The system is available at mocs.cs.arizona.edu.