Rupesh Nasre

h-index5

3papers

4citations

Novelty65%

AI Score45

Ranked #67,592 of 205,806 authors (top 33%)#328 in DC (top 28%)

3 Papers

59.4DCMay 30

StarDist: A Code Generator for Distributed Graph Algorithms

Barenya Kumar Nandy, Rupesh Nasre

We introduce StarDist, a Domain Specific Language for generating high-performant distributed graph algorithms in the message passing model. Our analysis-transformation framework optimizes graph traversal based on graph property access patterns, reduces global lock acquisitions on distributed structures, and minimizes message queues used in reduction operations. We provide a network optimized communication runtime for reduction operations that couples with our analysis framework and optimizes the propagation of updates based on vertex residency. StarDist is able to identify monotonic reduction blocks and is able to fuse reduction iterations over graphs into \textit{pulses}. We evaluate StarDist using three fundamental graph algorithms belonging to the CONGEST model: single-source shortest paths, weakly connected components, and PageRank computation, using a suite comprising both real-world and synthetic graphs across varying densities of topological compaction. Our results illustrate that the code generated with StarDist outperforms the distributed frameworks DRONE and D-Galois by an average of 19$\times$ and 7$\times$, respectively on our high communication setup and by 1.4$\times$ and 1.92$\times$ respectively on our high congestion network setup when averaged across all three algorithms.

LGDec 1, 2025

Morphling: Fast, Fused, and Flexible GNN Training at Scale

Anubhab, Rupesh Nasre

Graph Neural Networks (GNNs) present a fundamental hardware challenge by fusing irregular, memory-bound graph traversals with regular, compute-intensive dense matrix operations. While frameworks such as PyTorch Geometric (PyG) and Deep Graph Library (DGL) prioritize high-level usability, they fail to address these divergent execution characteristics. As a result, they rely on generic kernels that suffer from poor cache locality, excessive memory movement, and substantial intermediate allocations. To address these limitations, we present Morphling, a domain-specific code synthesizer designed to bridge this gap. Morphling compiles high-level GNN specifications into portable, backend-specialized implementations targeting OpenMP, CUDA, and MPI. It achieves this by instantiating a library of optimized, architecture-aware primitives tailored to each execution environment. Morphling also incorporates a runtime sparsity-aware execution engine that dynamically selects dense or sparse execution paths using input feature statistics, reducing unnecessary computation on zero-valued entries. We evaluate Morphling on eleven real-world datasets spanning diverse graph structures, feature dimensionalities, and sparsity regimes. The results show that Morphling improves per-epoch training throughput by an average of 20X on CPUs and 19X on GPUs over PyG and DGL, with peak speedups reaching 66X. Morphling's memory-efficient layouts further reduce peak memory consumption by up to 15X, enabling large-scale GNN training on commodity hardware. These findings demonstrate that specialized, architecture-aware code synthesis provides an effective and scalable path toward high-performance GNN execution across diverse parallel and distributed platforms.

SEApr 23, 2020

BOLD: An Ontology-based Log Debugger for C Programs

Dileep Kumar P, Rupesh Nasre, Sreenivasa Kumar P

The different activities related to debugging such as program instrumentation, representation of execution trace and analysis of trace are not typically performed in an unified framework. We propose \textit{BOLD}, an Ontology-based Log Debugger to unify and standardize the activities in debugging. The syntactical information of programs can be represented in the from of Resource Description Framework (RDF) triples. Using the BOLD framework, the programs can be automatically instrumented by using declarative specifications over these triples. A salient feature of the framework is to store the execution trace of the program also as RDF triples called \textit{trace triples}. These triples can be queried to implement the common debug operations. The novelty of the framework is to abstract these triples as \textit{spans} for high-level reasoning. A span gives a way of examining the values of a particular variable over certain portion of the program execution. The properties of the spans are defined formally as a Web Ontology Language (OWL) ontology called \textit{Program Debug (PD) Ontology}. Using the span abstraction and PD ontology, end-users can debug a given buggy program in a standard manner. A notable feature of using ontology is that users can accurately debug in some cases of missing information, which can be practically useful. To demonstrate the feasibility of the proposed framework, we have debugged the programs in a standard bug benchmark suite Software-artifact Infrastructure Repository (SIR). Experiments show that the querying time is almost the same as in \texttt{gdb}. The reasoning time depends on the sub-language of OWL. We find that the expressibility offered by OWL-DL language is sufficient for the bugs in SIR programs; but to achieve scalability in reasoning, a restricted OWL-RL language is required.