Shilpika

HC
6papers
139citations
Novelty34%
AI Score38

6 Papers

43.2DCApr 13
Understanding Large-Scale HPC System Behavior Through Cluster-Based Visual Analytics

Allison Austin, Shilpika, Yan To Linus Lam et al.

In high-performance computing (HPC) environments, system monitoring data is often unlabeled and high-dimensional, making it difficult to reliably detect and understand anomalous computing nodes. The growing scale and dimensionality of the collected datasets present significant challenges for analysis and visualization tasks. We present a scalable, interactive visual analytics system to support exploration, explanation, and comparison of compute node behaviors in HPC systems. Our approach integrates an analysis workflow combining two-phase dimensionality reduction with contrastive learning and multi-resolution dynamic mode decomposition to capture inter- and intra-cluster variations. These analyses are embedded in an interactive interface that enables users to explore clusters, compare temporal patterns, and iteratively refine hypotheses through customizable visual encodings and baselines. By integrating metrics such as CPU utilization and memory activity, the system offers a holistic view of large-scale system behavior. We demonstrate the utility of our tool through two case studies. In both cases, our system automatically identified meaningful node clusters and revealed subtle behavioral differences within and across node groups. Expert feedback confirmed the effectiveness of our tool in enhancing anomalous behavior detection and interpretation. Our work advances scalable visual analysis for HPC monitoring and has broader implications for cloud, edge computing, and distributed infrastructures where interpretability and behavior analysis are critical to operational efficiency.

HCJun 15, 2023
A Multi-Level, Multi-Scale Visual Analytics Approach to Assessment of Multifidelity HPC Systems

Shilpika, Bethany Lusch, Murali Emani et al.

The ability to monitor and interpret of hardware system events and behaviors are crucial to improving the robustness and reliability of these systems, especially in a supercomputing facility. The growing complexity and scale of these systems demand an increase in monitoring data collected at multiple fidelity levels and varying temporal resolutions. In this work, we aim to build a holistic analytical system that helps make sense of such massive data, mainly the hardware logs, job logs, and environment logs collected from disparate subsystems and components of a supercomputer system. This end-to-end log analysis system, coupled with visual analytics support, allows users to glean and promptly extract supercomputer usage and error patterns at varying temporal and spatial resolutions. We use multiresolution dynamic mode decomposition (mrDMD), a technique that depicts high-dimensional data as correlated spatial-temporal variations patterns or modes, to extract variation patterns isolated at specified frequencies. Our improvements to the mrDMD algorithm help promptly reveal useful information in the massive environment log dataset, which is then associated with the processed hardware and job log datasets using our visual analytics system. Furthermore, our system can identify the usage and error patterns filtered at user, project, and subcomponent levels. We exemplify the effectiveness of our approach with two use scenarios with the Cray XC40 supercomputer.

SEApr 5, 2018Code
Metrics Dashboard: A Hosted Platform for Software Quality Metrics

George K. Thiruvathukal, Shilpika, Nicholas J. Hayward et al.

There is an emerging consensus in the scientific software community that progress in scientific research is dependent on the "quality and accessibility of software at all levels" (wssspe.researchcomputing.org.uk/). This progress depends on embracing the best traditional---and emergent---practices in software engineering, especially agile practices that intersect with the more formal tradition of software engineering. As a first step in our larger exploratory project to study in-process quality metrics for software development projects in Computational Science and Engineering (CSE), we have developed the Metrics Dashboard, a platform for producing and observing metrics by mining open-source software repositories on GitHub. Unlike GitHub and similar systems that provide individual performance metrics (e.g. commits), the Metrics Dashboard focuses on metrics indicative of team progress and project health. The Metrics Dashboard allows the user to submit the URL of a hosted repository for batch analysis, whose results are then cached. Upon completion, the user can interactively study various metrics over time (at varying granularity), numerically and visually. The initial version of the system is up and running as a public cloud service (SaaS) and supports project size (KLOC), defect density, defect spoilage, and productivity. While our system is by no means the first to support software metrics, we believe it may be one of the first community-focused extensible resources that can be used by any hosted project.

HCSep 4, 2020
Staged Animation Strategies for Online Dynamic Networks

Tarik Crnovrsanin, Shilpika, Senthil Chandrasegaran et al.

Dynamic networks -- networks that change over time -- can be categorized into two types: offline dynamic networks, where all states of the network are known, and online dynamic networks, where only the past states of the network are known. Research on staging animated transitions in dynamic networks has focused more on offline data, where rendering strategies can take into account past and future states of the network. Rendering online dynamic networks is a more challenging problem since it requires a balance between timeliness for monitoring tasks -- so that the animations do not lag too far behind the events -- and clarity for comprehension tasks -- to minimize simultaneous changes that may be difficult to follow. To illustrate the challenges placed by these requirements, we explore three strategies to stage animations for online dynamic networks: time-based, event-based, and a new hybrid approach that we introduce by combining the advantages of the first two. We illustrate the advantages and disadvantages of each strategy in representing low- and high-throughput data and conduct a user study involving monitoring and comprehension of dynamic networks. We also conduct a follow-up, a think-aloud study combining monitoring and comprehension with experts in dynamic network visualization. Our findings show that animation staging strategies that emphasize comprehension do better for participant response times and accuracy. However, the notion of ``comprehension'' is not always clear when it comes to complex changes in highly dynamic networks, requiring some iteration in staging that the hybrid approach affords. Based on our results, we make recommendations for balancing event-based and time-based parameters for our hybrid approach.

HCAug 2, 2020
A Visual Analytics Framework for Reviewing Multivariate Time-Series Data with Dimensionality Reduction

Takanori Fujiwara, Shilpika, Naohisa Sakamoto et al.

Data-driven problem solving in many real-world applications involves analysis of time-dependent multivariate data, for which dimensionality reduction (DR) methods are often used to uncover the intrinsic structure and features of the data. However, DR is usually applied to a subset of data that is either single-time-point multivariate or univariate time-series, resulting in the need to manually examine and correlate the DR results out of different data subsets. When the number of dimensions is large either in terms of the number of time points or attributes, this manual task becomes too tedious and infeasible. In this paper, we present MulTiDR, a new DR framework that enables processing of time-dependent multivariate data as a whole to provide a comprehensive overview of the data. With the framework, we employ DR in two steps. When treating the instances, time points, and attributes of the data as a 3D array, the first DR step reduces the three axes of the array to two, and the second DR step visualizes the data in a lower-dimensional space. In addition, by coupling with a contrastive learning method and interactive visualizations, our framework enhances analysts' ability to interpret DR results. We demonstrate the effectiveness of our framework with four case studies using real-world datasets.

GRMay 10, 2019
An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data

Takanori Fujiwara, Jia-Kai Chou, Shilpika et al.

Dimensionality reduction (DR) methods are commonly used for analyzing and visualizing multidimensional data. However, when data is a live streaming feed, conventional DR methods cannot be directly used because of their computational complexity and inability to preserve the projected data positions at previous time points. In addition, the problem becomes even more challenging when the dynamic data records have a varying number of dimensions as often found in real-world applications. This paper presents an incremental DR solution. We enhance an existing incremental PCA method in several ways to ensure its usability for visualizing streaming multidimensional data. First, we use geometric transformation and animation methods to help preserve a viewer's mental map when visualizing the incremental results. Second, to handle data dimension variants, we use an optimization method to estimate the projected data positions, and also convey the resulting uncertainty in the visualization. We demonstrate the effectiveness of our design with two case studies using real-world datasets.