13.1SEMar 28
Taming the Hydra: Targeted Control-Flow Transformations for Dynamic Symbolic ExecutionCharitha Saumya, Muhammad Hassan, Rohan Gangaraju et al.
Dynamic Symbolic Execution (DSE) suffers from the path explosion problem when the target program has many conditional branches. The classical approach for managing the path explosion problem is dynamic state merging. Dynamic state merging combines similar symbolic program states to avoid the exponential growth in the number of states during DSE. However, state merging still requires solver invocations at each program branch, even when both paths of the branch are feasible. Moreover, the best path search strategy for DSE may not create the best state merging opportunities. Some drawbacks of state merging can be mitigated by compile-time state merging (i.e., branch elimination by converting control-flow into dataflow). In this paper, we propose a non-semantics-preserving but failure-preserving compiler transformation for removing expensive symbolic branches in a program to improve the scalability of DSE. We have developed a framework for detecting spurious bugs that our transformation can insert. Finally, we show that our transformation can significantly improve the performance of DSE on various benchmark programs and help improve the performance of coverage and bug discovery of large real-world programs.
83.9PLMay 1
SoCal: A Language for Memory-Layout Factorization of Recursive DatatypesVidush Singhal, Mikah Kainen, Artem Pelenitsyn et al.
Array-of-structures (AoS) to structure-of-arrays (SoA) is a classic compiler transformation that improves memory locality and enables data-parallel execution. Existing AoS-to-SoA transformations primarily target regular, array-based programs in imperative languages like C and C++. In contrast, many applications manipulate tree-shaped data structures, for example, ASTs in compilers, DOM trees in browsers, and k-d trees in scientific workloads. Prior work improves the performance of functional programs operating on such data by serializing algebraic datatypes (ADTs) into contiguous memory buffers. However, these representations interleave fields within a single buffer, similar to AoS layouts. We introduce factored, multi-buffer layouts that store different ADT fields in separate buffers, enabling SoA-like layouts for serialized recursive data structures. We formalize this approach in SoCal, a language for generating factored ADT representations, and implement it in a compiler called Colobus. Colobus automatically transforms functional programs to operate over a serialized, factored layout of recursive ADTs. Our evaluation shows a 1.46x geometric mean speedup on a suite of tree-processing benchmarks.
23.7GRApr 26
Rethinking Collision Detection on GPU Ray Tracing ArchitectureDurga Keerthi Mandarapu, Isaac Fuksman, Artem Pelenitsyn et al.
Discrete Collision Detection (DCD) is a fundamental task in several domains including particle-based physics simulations. Efficient DCD uses indexing structures such as Bounding Volume Hierarchy (BVH), but accelerating irregular BVH traversals demands meticulous efforts to achieve performance. Modern GPUs feature Ray Tracing (RT) architecture that provides hardware acceleration for BVH traversal and optimized drivers for BVH construction. Recent work has attempted to exploit RT architecture to accelerate DCD on spherical particles by reducing DCD to fixed-radius neighbor search. However, this reduction breaks down for particles with different radii, necessitating the use of large bounding boxes that result in a higher number of duplicate collisions and poor performance. To address these limitations, we present Mochi, a new reduction that reformulates DCD on RT architecture by exploiting the symmetry of collision relations to support both uniform and non-uniform spherical particles efficiently. Mochi introduces per-object proxy spheres that decouple BVH bounding volumes from the collision search radius, enabling significantly tighter bounding boxes without sacrificing correctness. Mochi is provably sound and guarantees that all true collisions are detected. We integrate Mochi into an end-to-end particle simulation pipeline and evaluate it across large-scale particle workloads, showing consistent speedups over state-of-the-art BVH-based and RT-based DCD implementations. Mochi generalizes prior RT-based neighbor search formulations while avoiding their fundamental limitations for non-uniform spheres.
LGMay 26, 2023
RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor SearchVani Nagarajan, Durga Mandarapu, Milind Kulkarni
The problem of identifying the k-Nearest Neighbors (kNNS) of a point has proven to be very useful both as a standalone application and as a subroutine in larger applications. Given its far-reaching applicability in areas such as machine learning and point clouds, extensive research has gone into leveraging GPU acceleration to solve this problem. Recent work has shown that using Ray Tracing cores in recent GPUs to accelerate kNNS is much more efficient compared to traditional acceleration using shader cores. However, the existing translation of kNNS to a ray tracing problem imposes a constraint on the search space for neighbors. Due to this, we can only use RT cores to accelerate fixed-radius kNNS, which requires the user to set a search radius a priori and hence can miss neighbors. In this work, we propose TrueKNN, the first unbounded RT-accelerated neighbor search. TrueKNN adopts an iterative approach where we incrementally grow the search space until all points have found their k neighbors. We show that our approach is orders of magnitude faster than existing approaches and can even be used to accelerate fixed-radius neighbor searches.
CRApr 19, 2021
Vectorized Secure Evaluation of Decision ForestsRaghav Malik, Vidush Singhal, Benjamin Gottfried et al.
As the demand for machine learning-based inference increases in tandem with concerns about privacy, there is a growing recognition of the need for secure machine learning, in which secret models can be used to classify private data without the model or data being leaked. Fully Homomorphic Encryption (FHE) allows arbitrary computation to be done over encrypted data, providing an attractive approach to providing such secure inference. While such computation is often orders of magnitude slower than its plaintext counterpart, the ability of FHE cryptosystems to do \emph{ciphertext packing} -- that is, encrypting an entire vector of plaintexts such that operations are evaluated elementwise on the vector -- helps ameliorate this overhead, effectively creating a SIMD architecture where computation can be vectorized for more efficient evaluation. Most recent research in this area has targeted regular, easily vectorizable neural network models. Applying similar techniques to irregular ML models such as decision forests remains unexplored, due to their complex, hard-to-vectorize structures. In this paper we present COPSE, the first system that exploits ciphertext packing to perform decision-forest inference. COPSE consists of a staging compiler that automatically restructures and compiles decision forest models down to a new set of vectorizable primitives for secure inference. We find that COPSE's compiled models outperform the state of the art across a range of decision forest models, often by more than an order of magnitude, while still scaling well.
QUANT-PHJun 22, 2020
Quantum Computing Methods for Supervised LearningViraj Kulkarni, Milind Kulkarni, Aniruddha Pant
The last two decades have seen an explosive growth in the theory and practice of both quantum computing and machine learning. Modern machine learning systems process huge volumes of data and demand massive computational power. As silicon semiconductor miniaturization approaches its physics limits, quantum computing is increasingly being considered to cater to these computational needs in the future. Small-scale quantum computers and quantum annealers have been built and are already being sold commercially. Quantum computers can benefit machine learning research and application across all science and engineering domains. However, owing to its roots in quantum mechanics, research in this field has so far been confined within the purview of the physics community, and most work is not easily accessible to researchers from other disciplines. In this paper, we provide a background and summarize key results of quantum computing before exploring its application to supervised machine learning problems. By eschewing results from physics that have little bearing on quantum computation, we hope to make this introduction accessible to data scientists, machine learning practitioners, and researchers from across disciplines.
LGMar 19, 2020
Survey of Personalization Techniques for Federated LearningViraj Kulkarni, Milind Kulkarni, Aniruddha Pant
Federated learning enables machine learning models to learn from private decentralized data without compromising privacy. The standard formulation of federated learning produces one shared model for all clients. Statistical heterogeneity due to non-IID distribution of data across devices often leads to scenarios where, for some clients, the local models trained solely on their private data perform better than the global shared model thus taking away their incentive to participate in the process. Several techniques have been proposed to personalize global models to work better for individual clients. This paper highlights the need for personalization and surveys recent research on this topic.
CRDec 25, 2019
Grand Challenges in Resilience: Autonomous System Resilience through Design and Runtime MeasuresSaurabh Bagchi, Vaneet Aggarwal, Somali Chaterji et al.
A set of about 80 researchers, practitioners, and federal agency program managers participated in the NSF-sponsored Grand Challenges in Resilience Workshop held on Purdue campus on March 19-21, 2019. The workshop was divided into three themes: resilience in cyber, cyber-physical, and socio-technical systems. About 30 attendees in all participated in the discussions of cyber resilience. This article brings out the substantive parts of the challenges and solution approaches that were identified in the cyber resilience theme. In this article, we put forward the substantial challenges in cyber resilience in a few representative application domains and outline foundational solutions to address these challenges. These solutions fall into two broad themes: resilience-by-design and resilience-by-reaction. We use examples of autonomous systems as the application drivers motivating cyber resilience. We focus on some autonomous systems in the near horizon (autonomous ground and aerial vehicles) and also a little more distant (autonomous rescue and relief). For resilience-by-design, we focus on design methods in software that are needed for our cyber systems to be resilient. In contrast, for resilience-by-reaction, we discuss how to make systems resilient by responding, reconfiguring, or recovering at runtime when failures happen. We also discuss the notion of adaptive execution to improve resilience, execution transparently and adaptively among available execution platforms (mobile/embedded, edge, and cloud). For each of the two themes, we survey the current state, and the desired state and ways to get there. We conclude the paper by looking at the research challenges we will have to solve in the short and the mid-term to make the vision of resilient autonomous systems a reality.