Haoran Wan

1.8LGOct 22, 2022

ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations

Zhiying Xu, Jiafan Xu, Hongding Peng et al.

Deep learning models rely on highly optimized tensor libraries for efficient inference on heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors and then optimize loops of operators. However, such unidirectional and one-off workflow strictly separates graph-level optimization and operator-level optimization into different system layers, missing opportunities for unified tuning. This paper proposes ALT, a compiler that performs joint graph- and operator-level optimizations for deep models. ALT provides a generic transformation module to manipulate layouts and loops with easy-to-use primitive functions. ALT further integrates an auto-tuning module that jointly optimizes graph-level data layouts and operator-level loops while guaranteeing efficiency. Experimental results show that ALT significantly outperforms state-of-the-art compilers (e.g., Ansor) in terms of both single operator performance (e.g., 1.5x speedup on average) and end-to-end inference performance (e.g., 1.4x speedup on average).

6.6NIApr 28

NeuralEmu: in situ Measurement-Driven, ML-based, High-Fidelity 5G Network Emulation

Haoran Wan, Yaxiong Xie, Kyle Jamieson

Current and future applications demand ultra-low latency and consistent throughput, yet frequently traverse 5G cellular networks, so cope with volatile packet dynamics, as 5G base station schedulers dynamically react to user workloads and wireless channel conditions. The task of evaluating network algorithms in these environments is hamstrung by current tools: record-and-replay emulators sever the feedback interaction that exists between application end points and a commercial operator's proprietary 5G scheduler, while full-stack simulators rely on overly simplistic scheduling logic. To bridge this reality gap, we present NeuralEmu, a high-fidelity, machine learning-based emulation framework that learns complex 5G scheduler resource allocation behaviors directly from extremely high-resolution network telemetry tools. The first emulator to handle multiple clients, NeuralEmu utilizes machine learning to dynamically predict resource block allocations and modulation schemes based on instantaneous user buffer occupancy and channel states. To capture realistic cross-user contention, a traffic reconstruction model inverts cellular network scheduling results to recover the underlying traffic patterns of uncontrolled background users. Implemented as an high-performance Linux middlebox emulator, NeuralEmu reduces emulation error relative to the state of the art for various network applications including but not limited to 55% for web-page load time, 57% for WebRTC encoder bit rate, and 51% for cloud gaming packet one-way delay, providing an accurate, standardized testing ground for tomorrow's real-time interactive network protocols and applications.

Haoran Wan

2 Papers