Nadav Rotem

IR
4papers
216citations
Novelty50%
AI Score45

4 Papers

IRMay 11
OpenZL: Using Graphs to Compress Smaller and Faster

Yann Collet, Nick Terrell, W. Felix Handte et al.

In the last few decades, research techniques have improved lossless compression ratios by significantly increasing processing time. However, these techniques have not gained popularity in industry because production systems require high throughput and low resource utilization. Instead, real world improvements in compression are increasingly realized by building application-specific compressors which can exploit knowledge about the structure and semantics of the data being compressed. Application-specific compressor systems outperform even the best generic compressors, but these techniques have severe drawbacks -- they are inherently limited in applicability, are hard to develop, and are difficult to maintain and deploy. In this work, we show that these challenges can be overcome with a new compression strategy. We propose the "graph model" of compression, a new theoretical framework for representing compression as a directed acyclic graph of modular codecs. OpenZL implements this framework and compresses data into a self-describing wire format, any configuration of which can be decompressed by a universal decoder. OpenZL's design enables rapid development of application-specific compressors with minimal code. Experimental results demonstrate that OpenZL achieves superior compression ratios and speeds compared to state-of-the-art general-purpose compressors on a variety of real-world datasets. Compared to ratio-focused deep-learning compressors, OpenZL is competitive on ratio while being many orders of magnitude faster. Internal deployments at Meta have also shown consistent improvements in size and/or speed, with development timelines reduced from months to days. OpenZL thus represents a significant advance in practical, scalable, and maintainable data compression for modern data-intensive applications.

SEOct 19, 2020Code
Warrior1: A Performance Sanitizer for C++

Nadav Rotem, Lee Howes, David Goldblatt

This paper presents Warrior1, a tool that detects performance anti-patterns in C++ libraries. Many programs are slowed down by many small inefficiencies. Large-scale C++ applications are large, complex, and developed by large groups of engineers over a long period of time, which makes the task of identifying inefficiencies difficult. Warrior1 was designed to detect the numerous small performance issues that are the result of inefficient use of C++ libraries. The tool detects performance anti-patterns such as map double-lookup, vector reallocation, short lived objects, and lambda object capture by value. Warrior1 is implemented as an instrumented C++ standard library and an off-line diagnostics tool. The tool is very effective in detecting issues. We demonstrate that the tool is able to find a wide range of performance anti-patterns in a number of popular performance sensitive open source projects.

PLDec 24, 2021
Profile Guided Optimization without Profiles: A Machine Learning Approach

Nadav Rotem, Chris Cummins

Profile guided optimization is an effective technique for improving the optimization ability of compilers based on dynamic behavior, but collecting profile data is expensive, cumbersome, and requires regular updating to remain fresh. We present a novel statistical approach to inferring branch probabilities that improves the performance of programs that are compiled without profile guided optimizations. We perform offline training using information that is collected from a large corpus of binaries that have branch probabilities information. The learned model is used by the compiler to predict the branch probabilities of regular uninstrumented programs, which the compiler can then use to inform optimization decisions. We integrate our technique directly in LLVM, supplementing the existing human-engineered compiler heuristics. We evaluate our technique on a suite of benchmarks, demonstrating some gains over compiling without profile information. In deployment, our technique requires no profiling runs and has negligible effect on compilation time.

LGNov 24, 2018
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

Jongsoo Park, Maxim Naumov, Protonu Basu et al.

The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper provides detailed characterizations of deep learning models used in many Facebook social network services. We present computational characteristics of our models, describe high performance optimizations targeting existing systems, point out their limitations and make suggestions for the future general-purpose/accelerated inference hardware. Also, we highlight the need for better co-design of algorithms, numerics and computing platforms to address the challenges of workloads often run in data centers.