HCJun 9, 2021
An Extensible Dashboard Architecture For Visualizing Base And Analyzed DataAbhishek Santra, Kunal Samant, Endrit Memeti et al.
Any data analysis, especially the data sets that may be changing often or in real-time, consists of at least three important synchronized components: i) figuring out what to infer (objectives), ii) analysis or computation of objectives, and iii) understanding of the results which may require drill-down and/or visualization. There is a lot of attention paid to the first two of the above components as part of research whereas the understanding as well as deriving actionable decisions is quite tricky. Visualization is an important step towards both understanding (even by non-experts) and inferring the actions that need to be taken. As an example, for Covid-19, knowing regions (say, at the county or state level) that have seen a spike or prone to a spike in cases in the near future may warrant additional actions with respect to gatherings, business opening hours, etc. This paper focuses on an extensible architecture for visualization of base as well as analyzed data. This paper proposes a modular architecture of a dashboard for user-interaction, visualization management, and complex analysis of base data. The contributions of this paper are: i) extensibility of the architecture providing flexibility to add additional analysis, visualizations, and user interactions without changing the workflow, ii) decoupling of the functional modules to ease and speedup development by different groups, and iii) address efficiency issues for display response time. This paper uses Multilayer Networks (or MLNs) for analysis. To showcase the above, we present the implementation of a visualization dashboard, termed CoWiz++ (for Covid Wizard), and elaborate on how web-based user interaction and display components are interfaced seamlessly with the back end modules.
SIMay 24, 2021
From Base Data To Knowledge Discovery -- A Life Cycle Approach -- Using Multilayer NetworksAbhishek Santra, Kanthi Komar, Sanjukta Bhowmick et al.
Any large complex data analysis to infer or discover meaningful information/knowledge involves the following steps (in addition to data collection, cleaning, preparing the data for analysis such as attribute elimination): i) Modeling the data -- an approach for modeling and deriving a data representation for analysis using that approach, ii) translating analysis objectives into computations on the model generated; this can be as simple as a single computation (e.g., community detection) or may involve a sequence of operations (e.g., pair-wise community detection over multiple networks) using expressions based on the model, iii) computation of the expressions generated -- efficiency and scalability come into picture here, and iv) drill-down of results to interpret or understand them clearly. Beyond this, it is also meaningful to visualize results for easier understanding. Covid-19 visualization dashboard presented in this paper is an example of this. This paper covers all of the above steps of data analysis life cycle using a data representation that is gaining importance for multi-entity, multi-feature data sets - Multilayer Networks. We use several data sets to establish the effectiveness of modeling using MLNs and analyze them using the proposed decoupling approach. For coverage, we use different types of MLNs for modeling, and community and centrality computations for analysis. The data sets used - US commercial airlines, IMDb, DBLP, and Covid-19 data set. Our experimental analyses using the identified steps validate modeling, breadth of objectives that can be computed, and overall versatility of the life cycle approach. Correctness of results is verified, where possible, using independently available ground truth. We demonstrate drill-down that is afforded by this approach (due to structure and semantics preservation) for a better understanding and visualization of results.
CLMay 21, 2019
Generic Multilayer Network Data Analysis with the Fusion of Content and StructureXuan-Son Vu, Abhishek Santra, Sharma Chakravarthy et al.
Multi-feature data analysis (e.g., on Facebook, LinkedIn) is challenging especially if one wants to do it efficiently and retain the flexibility by choosing features of interest for analysis. Features (e.g., age, gender, relationship, political view etc.) can be explicitly given from datasets, but also can be derived from content (e.g., political view based on Facebook posts). Analysis from multiple perspectives is needed to understand the datasets (or subsets of it) and to infer meaningful knowledge. For example, the influence of age, location, and marital status on political views may need to be inferred separately (or in combination). In this paper, we adapt multilayer network (MLN) analysis, a nontraditional approach, to model the Facebook datasets, integrate content analysis, and conduct analysis, which is driven by a list of desired application based queries. Our experimental analysis shows the flexibility and efficiency of the proposed approach when modeling and analyzing datasets with multiple features.
DBNov 4, 2016
Scalable Holistic Analysis of Multi-Source, Data-Intensive Problems Using Multilayered NetworksAbhishek Santra, Sanjukta Bhowmick, Sharma Chakravarthy
Holistic analysis of many real-world problems are based on data collected from multiple sources contributing to some aspect of that problem. The word fusion has also been used in the literature for such problems involving disparate data types. Holistically understanding traffic patterns, causes of accidents, bombings, terrorist planning and many natural phenomenon such as storms, earthquakes fall into this category. Some may have real-time requirements and some may need to be analyzed after the fact (post-mortem or forensic analysis.) What is common for all these problems is that the amount and types of data associated with the event. Data may also be incomplete and trustworthiness of sources may also vary. Currently, manual and ad-hoc approaches are used in aggregating data in different ways for analyzing and understanding these problems. In this paper, we approach this problem in a novel way using multilayered networks. We identify features of a central event and propose a network layer for each feature. This approach allows us to study the effect of each feature independently and its impact on the event. We also establish that the proposed approach allows us to compose these features in arbitrary ways (without loss of information) to analyze their combined effect. Additionally, formulation of relationships (e.g., distance measure for a single feature instead of several at the same time) is simpler. Further, computations can be done once on each layer in this approach and reused for mixing and matching the features for aggregate impacts and "what if" scenarios to understand the problem holistically. This has been demonstrated by recreating the communities for the AND-Composed network by using the communities of the individual layers. We believe that techniques proposed here make an important contribution to the nascent yet fast growing area of data fusion.