Joseph M. Hellerstein

CR
10papers
1,764citations
Novelty54%
AI Score45

10 Papers

SESep 16, 2022
Operationalizing Machine Learning: An Interview Study

Shreya Shankar, Rolando Garcia, Joseph M. Hellerstein et al.

Organizations rely on machine learning engineers (MLEs) to operationalize ML, i.e., deploy and maintain ML pipelines in production. The process of operationalizing ML, or MLOps, consists of a continual loop of (i) data collection and labeling, (ii) experimentation to improve ML performance, (iii) evaluation throughout a multi-staged deployment process, and (iv) monitoring of performance drops in production. When considered together, these responsibilities seem staggering -- how does anyone do MLOps, what are the unaddressed challenges, and what are the implications for tool builders? We conducted semi-structured ethnographic interviews with 18 MLEs working across many applications, including chatbots, autonomous vehicles, and finance. Our interviews expose three variables that govern success for a production ML deployment: Velocity, Validation, and Versioning. We summarize common practices for successful ML experimentation, deployment, and sustaining production performance. Finally, we discuss interviewees' pain points and anti-patterns, with implications for tool design.

DCJun 12, 2020Code
Hindsight Logging for Model Training

Rolando Garcia, Eric Liu, Vikram Sreekanti et al.

In modern Machine Learning, model training is an iterative, experimental process that can consume enormous computation resources and developer time. To aid in that process, experienced model developers log and visualize program variables during training runs. Exhaustive logging of all variables is infeasible. Optimistic logging can be accompanied by program checkpoints; this allows developers to add log statements post-hoc, and "replay" desired log statements from checkpoint -- a process we refer to as hindsight logging. Unfortunately, hindsight logging raises tricky problems in data management and software engineering. Done poorly, hindsight logging can waste resources and generate technical debt embodied in multiple variants of training code. In this paper, we present methodologies for efficient and effective logging practices for model training, with a focus on techniques for hindsight logging. Our goal is for experienced model developers to learn and adopt these practices. To make this easier, we provide an open-source suite of tools for Fast Low-Overhead Recovery (flor) that embodies our design across three tasks: (i) efficient background logging in Python, (ii) adaptable periodic checkpointing, and (iii) an instrumentation library that codifies hindsight logging for efficient and automatic record-replay of model-training. Model developers can use each flor tool separately as they see fit, or they can use flor in hands-free mode, entrusting it to instrument their code end-to-end for efficient record-replay. Our solutions leverage techniques from physiological transaction logs and recovery in database systems. Evaluations on modern ML benchmarks demonstrate that flor can produce fast checkpointing with small user-specifiable overheads (e.g. 7%), and still provide hindsight log replay times orders of magnitude faster than restarting training from scratch.

98.6CCMar 30
On the Complexity of Determinations

Joseph M. Hellerstein

Classical complexity theory measures the cost of computing a function, but many computational tasks require committing to one valid output among several. We introduce determination depth -- the minimum number of sequential layers of irrevocable commitments needed to select a single valid output -- and show it is an orthogonal axis measuring the cost of committing. We exhibit relational tasks whose commitments are constant-time table lookups yet require exponential parallel width to compensate for any reduction in depth. A conservation law couples determination and computational depth: enriching the commitment basis trades layers for step complexity, but the total sequential cost is bounded below. For circuit-encoded specifications, the resulting depth hierarchy captures the polynomial hierarchy ($Σ_{2k}^P$-complete for each fixed $k$, PSPACE-complete for unbounded $k$). Determination depth and computational depth are orthogonal: neither subsumes the other. Stable matching witnesses the independence sharply -- finding a stable matching is in P, yet every finite determination depth arises as the rotation-poset height of some instance. In the distributed setting, the framework recovers the Halpern--Moses common-knowledge impossibility.

CROct 26, 2020
Senate: A Maliciously-Secure MPC Platform for Collaborative Analytics

Rishabh Poddar, Sukrit Kalra, Avishay Yanai et al.

Many organizations stand to benefit from pooling their data together in order to draw mutually beneficial insights -- e.g., for fraud detection across banks, better medical studies across hospitals, etc. However, such organizations are often prevented from sharing their data with each other by privacy concerns, regulatory hurdles, or business competition. We present Senate, a system that allows multiple parties to collaboratively run analytical SQL queries without revealing their individual data to each other. Unlike prior works on secure multi-party computation (MPC) that assume that all parties are semi-honest, Senate protects the data even in the presence of malicious adversaries. At the heart of Senate lies a new MPC decomposition protocol that decomposes the cryptographic MPC computation into smaller units, some of which can be executed by subsets of parties and in parallel, while preserving its security guarantees. Senate then provides a new query planning algorithm that decomposes and plans the cryptographic computation effectively, achieving a performance of up to 145$\times$ faster than the state-of-the-art.

DBMay 10, 2019
Deep Unsupervised Cardinality Estimation

Zongheng Yang, Eric Liang, Amog Kamsetty et al.

Cardinality estimation has long been grounded in statistical tools for density estimation. To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep autoregressive models. However, direct application of these models leads to a limited estimator that is prohibitively expensive to evaluate for range or wildcard predicates. To produce a truly usable estimator, we develop a Monte Carlo integration scheme on top of autoregressive models that can efficiently handle range queries with dozens of dimensions or more. Like classical synopses, our estimator summarizes the data without supervision. Unlike previous solutions, we approximate the joint data distribution without any independence assumptions. Evaluated on real-world datasets and compared against real systems and dominant families of techniques, our estimator achieves single-digit multiplicative error at tail, an up to 90$\times$ accuracy improvement over the second best method, and is space- and runtime-efficient.

DCJan 7, 2019
Keeping CALM: When Distributed Consistency is Easy

Joseph M. Hellerstein, Peter Alvaro

A key concern in modern distributed systems is to avoid the cost of coordination while maintaining consistent semantics. Until recently, there was no answer to the question of when coordination is actually required. In this paper we present an informal introduction to the CALM Theorem, which answers this question precisely by moving up from traditional storage consistency to consider properties of programs. CALM is an acronym for "consistency as logical monotonicity". The CALM Theorem shows that the programs that have consistent, coordination-free distributed implementations are exactly the programs that can be expressed in monotonic logic. This theoretical result has practical implications for developers of distributed applications. We show how CALM provides a constructive application-level counterpart to conventional "systems" wisdom, such as the apparently negative results of the CAP Theorem. We also discuss ways that monotonic thinking can influence distributed systems design, and how new programming language designs and tools can help developers write consistent, coordination-free code.

CRSep 20, 2018
Chorus: a Programming Framework for Building Scalable Differential Privacy Mechanisms

Noah Johnson, Joseph P. Near, Joseph M. Hellerstein et al.

Differential privacy is fast becoming the gold standard in enabling statistical analysis of data while protecting the privacy of individuals. However, practical use of differential privacy still lags behind research progress because research prototypes cannot satisfy the scalability requirements of production deployments. To address this challenge, we present Chorus, a framework for building scalable differential privacy mechanisms which is based on cooperation between the mechanism itself and a high-performance production database management system (DBMS). We demonstrate the use of Chorus to build the first highly scalable implementations of complex mechanisms like Weighted PINQ, MWEM, and the matrix mechanism. We report on our experience deploying Chorus at Uber, and evaluate its scalability on real-world queries.

HCJun 5, 2018
Facilitating Exploration with Interaction Snapshots under High Latency

Yifan Wu, Remco Chang, Joseph M. Hellerstein et al.

Latency is, unfortunately, a reality when working with large datasets. Guaranteeing imperceptible latency for interactivity is often prohibitively expensive: the application developer may be forced to migrate data processing engines or deal with complex error bounds on samples, and to limit the application to users with high network bandwidth. Instead of relying on the backend, we propose a simple UX design---interaction snapshots. Responses of requests from the interactions are asynchronously loaded in "snapshots". With interaction snapshots, users can interact concurrently while the snapshots load. Our user study participants found it useful not to have to wait for each result and easily navigate to prior snapshots. For latency up to 5 seconds, participants were able to complete extrema, threshold, and trend identification tasks with little negative impact.

AIDec 15, 2017
A Berkeley View of Systems Challenges for AI

Ion Stoica, Dawn Song, Raluca Ada Popa et al.

With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production. These changes have been made possible by unprecedented levels of data and computation, by methodological advances in machine learning, by innovations in systems software and architectures, and by the broad accessibility of these technologies. The next generation of AI systems promises to accelerate these developments and increasingly impact our lives via frequent interactions and making (often mission-critical) decisions on our behalf, often in highly personalized contexts. Realizing this promise, however, raises daunting challenges. In particular, we need AI systems that make timely and safe decisions in unpredictable environments, that are robust against sophisticated adversaries, and that can process ever increasing amounts of data across organizations and individuals without compromising confidentiality. These challenges will be exacerbated by the end of the Moore's Law, which will constrain the amount of data these technologies can store and process. In this paper, we propose several open research directions in systems, architectures, and security that can address these challenges and help unlock AI's potential to improve lives and society.

DBApr 26, 2012
Distributed GraphLab: A Framework for Machine Learning in the Cloud

Yucheng Low, Joseph Gonzalez, Aapo Kyrola et al.

While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, we introduced the GraphLab abstraction which naturally expresses asynchronous, dynamic, graph-parallel computation while ensuring data consistency and achieving a high degree of parallel performance in the shared-memory setting. In this paper, we extend the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees. We develop graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency. We also introduce fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm and demonstrate how it can be easily implemented by exploiting the GraphLab abstraction itself. Finally, we evaluate our distributed implementation of the GraphLab abstraction on a large Amazon EC2 deployment and show 1-2 orders of magnitude performance gains over Hadoop-based implementations.