Daniel Wong

h-index23

5papers

275citations

Novelty57%

AI Score43

Ranked #52,054 of 194,257 authors (top 27%)#11,909 in LG (top 30%)

5 Papers

6.8DCApr 24

Coordinating GPU Data Centers and Power Grid Regulation Service for Exogenous Carbon Benefits

Ali Jahanshahi, Sara Rashidi Golrouye, Osten Anderson et al.

The rapid growth of AI/ML data centers has led to higher energy consumption and carbon emissions. The shift to renewable energy and growing data center energy demands can destabilize the power grid. Power grids rely on frequency regulation reserves, typically fossil-fueled power plants, to stabilize and balance the supply and demand of electricity. This paper sheds light on the hidden carbon emissions of frequency regulation service. Our work explores how modern GPU data centers can coordinate with power grids to reduce the need for fossil-fueled frequency regulation reserves. We first introduce a novel metric, Exogenous Carbon, to quantify grid-side carbon emission reductions resulting from data center participation in regulation service. We additionally introduce EcoCenter, a framework to maximize the amount of frequency regulation provision that GPU data centers can provide, and thus, reduce the amount of frequency regulation reserves necessary. We demonstrate that data center participation in frequency regulation can result in Exogenous carbon savings that can outweigh operational carbon emissions

4.3SDDec 8, 2021

NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers

Jonah Casebeer, Jacob Donley, Daniel Wong et al.

Estimating a time-varying spatial covariance matrix for a beamforming algorithm is a challenging task, especially for wearable devices, as the algorithm must compensate for time-varying signal statistics due to rapid pose-changes. In this paper, we propose Neural Integrated Covariance Estimators for Beamformers, NICE-Beam. NICE-Beam is a general technique for learning how to estimate time-varying spatial covariance matrices, which we apply to joint speech enhancement and dereverberation. It is based on training a neural network module to non-linearly track and leverage scene information across time. We integrate our solution into a beamforming pipeline, which enables simple training, faster than real-time inference, and a variety of test-time adaptation options. We evaluate the proposed model against a suite of baselines in scenes with both stationary and moving microphones. Our results show that the proposed method can outperform a hand-tuned estimator, despite the hand-tuned estimator using oracle source separation knowledge.

16.5LGOct 21, 2020

Transferable Graph Optimizers for ML Compilers

Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi et al.

Most compilers for machine learning (ML) frameworks need to solve many correlated optimization problems to generate efficient machine code. Current ML compilers rely on heuristics based algorithms to solve these optimization problems one at a time. However, this approach is not only hard to maintain but often leads to sub-optimal solutions especially for newer model architectures. Existing learning based approaches in the literature are sample inefficient, tackle a single optimization problem, and do not generalize to unseen graphs making them infeasible to be deployed in practice. To address these limitations, we propose an end-to-end, transferable deep reinforcement learning method for computational graph optimization (GO), based on a scalable sequential attention mechanism over an inductive graph neural network. GO generates decisions on the entire graph rather than on each individual node autoregressively, drastically speeding up the search compared to prior methods. Moreover, we propose recurrent attention layers to jointly optimize dependent graph optimization tasks and demonstrate 33%-60% speedup on three graph optimization tasks compared to TensorFlow default optimization. On a diverse set of representative graphs consisting of up to 80,000 nodes, including Inception-v3, Transformer-XL, and WaveNet, GO achieves on average 21% improvement over human experts and 18% improvement over the prior state of the art with 15x faster convergence, on a device placement task evaluated in real systems.

24.8LGOct 2, 2019

Accelerating Deep Learning by Focusing on the Biggest Losers

Angela H. Jiang, Daniel L. -K. Wong, Giulio Zhou et al.

This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example's forward pass to decide whether to use that example to compute gradients and update parameters, or to skip immediately to the next example. By reducing the number of computationally-expensive backpropagation steps performed, Selective-Backprop accelerates training. Evaluation on CIFAR10, CIFAR100, and SVHN, across a variety of modern image models, shows that Selective-Backprop converges to target error rates up to 3.5x faster than with standard SGD and between 1.02--1.8x faster than a state-of-the-art importance sampling approach. Further acceleration of 26% can be achieved by using stale forward pass results for selection, thus also skipping forward passes of low priority examples.

9.5LGSep 28, 2019

GDP: Generalized Device Placement for Dataflow Graphs

Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi et al.

Runtime and scalability of large neural networks can be significantly affected by the placement of operations in their dataflow graphs on suitable devices. With increasingly complex neural network architectures and heterogeneous device characteristics, finding a reasonable placement is extremely challenging even for domain experts. Most existing automated device placement approaches are impractical due to the significant amount of compute required and their inability to generalize to new, previously held-out graphs. To address both limitations, we propose an efficient end-to-end method based on a scalable sequential attention mechanism over a graph neural network that is transferable to new graphs. On a diverse set of representative deep learning models, including Inception-v3, AmoebaNet, Transformer-XL, and WaveNet, our method on average achieves 16% improvement over human experts and 9.2% improvement over the prior art with 15 times faster convergence. To further reduce the computation cost, we pre-train the policy network on a set of dataflow graphs and use a superposition network to fine-tune it on each individual graph, achieving state-of-the-art performance on large hold-out graphs with over 50k nodes, such as an 8-layer GNMT.