AIOct 27, 2019Code
Long-term Joint Scheduling for Urban TrafficXianfeng Liang, Likang Wu, Joya Chen et al.
Recently, the traffic congestion in modern cities has become a growing worry for the residents. As presented in Baidu traffic report, the commuting stress index has reached surprising 1.973 in Beijing during rush hours, which results in longer trip time and increased vehicular queueing. Previous works have demonstrated that by reasonable scheduling, e.g, rebalancing bike-sharing systems and optimized bus transportation, the traffic efficiency could be significantly improved with little resource consumption. However, there are still two disadvantages that restrict their performance: (1) they only consider single scheduling in a short time, but ignoring the layout after first reposition, and (2) they only focus on the single transport. However, the multi-modal characteristics of urban public transportation are largely under-exploited. In this paper, we propose an efficient and economical multi-modal traffic scheduling scheme named JLRLS based on spatio -temporal prediction, which adopts reinforcement learning to obtain optimal long-term and joint schedule. In JLRLS, we combines multiple transportation to conduct scheduling by their own characteristics, which potentially helps the system to reach the optimal performance. Our implementation of an example by PaddlePaddle is available at https://github.com/bigdata-ustc/Long-term-Joint-Scheduling, with an explaining video at https://youtu.be/t5M2wVPhTyk.
OCFeb 10, 2020
SPAN: A Stochastic Projected Approximate Newton MethodXunpeng Huang, Xianfeng Liang, Zhengyang Liu et al.
Second-order optimization methods have desirable convergence properties. However, the exact Newton method requires expensive computation for the Hessian and its inverse. In this paper, we propose SPAN, a novel approximate and fast Newton method. SPAN computes the inverse of the Hessian matrix via low-rank approximation and stochastic Hessian-vector products. Our experiments on multiple benchmark datasets demonstrate that SPAN outperforms existing first-order and second-order optimization methods in terms of the convergence wall-clock time. Furthermore, we provide a theoretical analysis of the per-iteration complexity, the approximation error, and the convergence rate. Both the theoretical analysis and experimental results show that our proposed method achieves a better trade-off between the convergence rate and the per-iteration efficiency.
LGDec 30, 2019
Variance Reduced Local SGD with Lower Communication ComplexityXianfeng Liang, Shuheng Shen, Jingchang Liu et al.
To accelerate the training of machine learning models, distributed stochastic gradient descent (SGD) and its variants have been widely adopted, which apply multiple workers in parallel to speed up training. Among them, Local SGD has gained much attention due to its lower communication cost. Nevertheless, when the data distribution on workers is non-identical, Local SGD requires $O(T^{\frac{3}{4}} N^{\frac{3}{4}})$ communications to maintain its \emph{linear iteration speedup} property, where $T$ is the total number of iterations and $N$ is the number of workers. In this paper, we propose Variance Reduced Local SGD (VRL-SGD) to further reduce the communication complexity. Benefiting from eliminating the dependency on the gradient variance among workers, we theoretically prove that VRL-SGD achieves a \emph{linear iteration speedup} with a lower communication complexity $O(T^{\frac{1}{2}} N^{\frac{3}{2}})$ even if workers access non-identical datasets. We conduct experiments on three machine learning tasks, and the experimental results demonstrate that VRL-SGD performs impressively better than Local SGD when the data among workers are quite diverse.
LGJun 28, 2019
Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient DescentShuheng Shen, Linli Xu, Jingchang Liu et al.
With the increase in the amount of data and the expansion of model scale, distributed parallel training becomes an important and successful technique to address the optimization challenges. Nevertheless, although distributed stochastic gradient descent (SGD) algorithms can achieve a linear iteration speedup, they are limited significantly in practice by the communication cost, making it difficult to achieve a linear time speedup. In this paper, we propose a computation and communication decoupled stochastic gradient descent (CoCoD-SGD) algorithm to run computation and communication in parallel to reduce the communication cost. We prove that CoCoD-SGD has a linear iteration speedup with respect to the total computation capability of the hardware resources. In addition, it has a lower communication complexity and better time speedup comparing with traditional distributed SGD algorithms. Experiments on deep neural network training demonstrate the significant improvements of CoCoD-SGD: when training ResNet18 and VGG16 with 16 Geforce GTX 1080Ti GPUs, CoCoD-SGD is up to 2-3$\times$ faster than traditional synchronous SGD.