DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
This addresses device assignment inefficiencies in asynchronous dataflow systems for machine learning practitioners, though it is incremental as it builds on prior learning-based methods by incorporating expert heuristics and system awareness.
The paper tackles the problem of assigning operations in dataflow graphs to devices to minimize execution time in asynchronous systems, particularly for machine learning workloads, and shows that Doppler reduces system execution time and per-episode training time compared to baselines.
We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to three key limitations: (1) reliance on bulk-synchronous systems like TensorFlow, which under-utilize devices due to barrier synchronization; (2) lack of awareness of the scheduling mechanism of underlying systems when designing learning-based methods; and (3) exclusive dependence on reinforcement learning, ignoring the structure of effective heuristics designed by experts. In this paper, we propose \textsc{Doppler}, a three-stage framework for training dual-policy networks consisting of 1) a $\mathsf{SEL}$ policy for selecting operations and 2) a $\mathsf{PLC}$ policy for placing chosen operations on devices. Our experiments show that \textsc{Doppler} outperforms all baseline methods across tasks by reducing system execution time and additionally demonstrates sampling efficiency by reducing per-episode training time.