Satish Rao

3papers

54citations

Novelty70%

AI Score28

Ranked #155,865 of 205,806 authors (top 76%)#495 in DS (top 86%)

3 Papers

DSNov 20, 2015

Faster Parallel Solver for Positive Linear Programs via Dynamically-Bucketed Selective Coordinate Descent

Di Wang, Michael Mahoney, Nishanth Mohan et al.

We provide improved parallel approximation algorithms for the important class of packing and covering linear programs. In particular, we present new parallel $ε$-approximate packing and covering solvers which run in $\tilde{O}(1/ε^2)$ expected time, i.e., in expectation they take $\tilde{O}(1/ε^2)$ iterations and they do $\tilde{O}(N/ε^2)$ total work, where $N$ is the size of the constraint matrix and $ε$ is the error parameter, and where the $\tilde{O}$ hides logarithmic factors. To achieve our improvement, we introduce an algorithmic technique of broader interest: dynamically-bucketed selective coordinate descent (DB-SCD). At each step of the iterative optimization algorithm, the DB-SCD method dynamically buckets the coordinates of the gradient into those of roughly equal magnitude, and it updates all the coordinates in one of the buckets. This dynamically-bucketed updating permits us to take steps along several coordinates with similar-sized gradients, thereby permitting more appropriate step sizes at each step of the algorithm. In particular, this technique allows us to use in a straightforward manner the recent analysis from the breakthrough results of Allen-Zhu and Orecchia [2] to achieve our still-further improved bounds. More generally, this method addresses "interference" among coordinates, by which we mean the impact of the update of one coordinate on the gradients of other coordinates. Such interference is a core issue in parallelizing optimization routines that rely on smoothness properties. Since our DB-SCD method reduces interference via updating a selective subset of variables at each iteration, we expect it may also have more general applicability in optimization.

DSJun 19, 2017

Capacity Releasing Diffusion for Speed and Locality

Di Wang, Kimon Fountoulakis, Monika Henzinger et al.

Diffusions and related random walk procedures are of central importance in many areas of machine learning, data analysis, and applied mathematics. Because they spread mass agnostically at each step in an iterative manner, they can sometimes spread mass "too aggressively," thereby failing to find the "right" clusters. We introduce a novel Capacity Releasing Diffusion (CRD) Process, which is both faster and stays more local than the classical spectral diffusion process. As an application, we use our CRD Process to develop an improved local algorithm for graph clustering. Our local graph clustering method can find local clusters in a model of clustering where one begins the CRD Process in a cluster whose vertices are connected better internally than externally by an $O(\log^2 n)$ factor, where $n$ is the number of nodes in the cluster. Thus, our CRD Process is the first local graph clustering algorithm that is not subject to the well-known quadratic Cheeger barrier. Our result requires a certain smoothness condition, which we expect to be an artifact of our analysis. Our empirical evaluation demonstrates improved results, in particular for realistic social graphs where there are moderately good---but not very good---clusters.

DSOct 6, 2015

Unified Acceleration Method for Packing and Covering Problems via Diameter Reduction

Di Wang, Satish Rao, Michael W. Mahoney

The linear coupling method was introduced recently by Allen-Zhu and Orecchia for solving convex optimization problems with first order methods, and it provides a conceptually simple way to integrate a gradient descent step and mirror descent step in each iteration. The high-level approach of the linear coupling method is very flexible, and it has shown initial promise by providing improved algorithms for packing and covering linear programs. Somewhat surprisingly, however, while the dependence of the convergence rate on the error parameter $ε$ for packing problems was improved to $O(1/ε)$, which corresponds to what accelerated gradient methods are designed to achieve, the dependence for covering problems was only improved to $O(1/ε^{1.5})$, and even that required a different more complicated algorithm. Given the close connections between packing and covering problems and since previous algorithms for these very related problems have led to the same $ε$ dependence, this discrepancy is surprising, and it leaves open the question of the exact role that the linear coupling is playing in coordinating the complementary gradient and mirror descent step of the algorithm. In this paper, we clarify these issues for linear coupling algorithms for packing and covering linear programs, illustrating that the linear coupling method can lead to improved $O(1/ε)$ dependence for both packing and covering problems in a unified manner, i.e., with the same algorithm and almost identical analysis. Our main technical result is a novel diameter reduction method for covering problems that is of independent interest and that may be useful in applying the accelerated linear coupling method to other combinatorial problems.