Kirk Pruhs

h-index39

7papers

39citations

Novelty53%

AI Score43

Ranked #56,636 of 194,257 authors (top 29%)#202 in DS (top 42%)

7 Papers

8.7DSApr 4

Approximation Algorithms for Matroid-Intersection Coloring with Applications to Rota's Basis Conjecture

Stephen Arndt, Benjamin Moseley, Kirk Pruhs et al.

We study algorithmic matroid intersection coloring. Given $k$ matroids on a common ground set $U$ of $n$ elements, the goal is to partition $U$ into the fewest number of color classes, where each color class is independent in all matroids. It is known that $2Ï_{\max}$ colors suffice to color the intersection of two matroids, $(2k-1)Ï_{\max}$ colors suffice for general $k$, where $Ï_{\max}$ is the maximum chromatic number of the individual matroids. However, these results are non-constructive, leveraging techniques such as topological Hall's theorem and Sperner's Lemma. We provide the first polynomial-time algorithms to color two or more general matroids where the approximation ratio depends only on $k$ and, in particular, is independent of $n$. For two matroids, we constructively match the $2Ï_{\max}$ existential bound, yielding a 2-approximation for the Matroid Intersection Coloring problem. For $k$ matroids we achieve a $(k^2-k)Ï_{\max}$ coloring, which is the first $O(1)$-approximation for constant $k$. Our approach introduces a novel matroidal structure we call a \emph{flexible decomposition}. We use this to formally reduce general matroid intersection coloring to graph coloring while avoiding the limitations of partition reduction techniques, and without relying on non-constructive topological machinery. Furthermore, we give a \emph{fully polynomial randomized approximation scheme} (FPRAS) for coloring the intersection of two matroids when $Ï_{\max}$ is large. This yields the first polynomial-time constructive algorithm for an asymptotic variant of Rota's Basis Conjecture. This constructivizes Montgomery and Sauermann's recent asymptotic breakthrough and generalizes it to arbitrary matroids.

9.1DSMar 15

Indirect Coflow Scheduling

Alexander Lindermayr, Kirk Pruhs, Andréa W. Richa et al.

We consider routing in reconfigurable networks, which is also known as coflow scheduling in the literature. The algorithmic literature generally (perhaps implicitly) assumes that the amount of data to be transferred is large. Thus the standard way to model a collection of requested data transfers is by an integer demand matrix $D$, where the entry in row $i$ and column $j$ of $D$ is an integer representing the amount of information that the application wants to send from machine/node $i$ to machine/node $j$. A feasible coflow schedule is then a sequence of matchings, which represent the sequence of data transfers that covers $D$. In this work, we investigate coflow scheduling when the size of some of the requested data transfers may be small relative to the amount of data that can be transferred in one round. fractional matchings and/or that employ indirect routing, and compare the relative utility of these options. We design algorithms that perform much better for small demands than the algorithms in the literature that were designed for large data transfers.

1.2DBJul 25, 2021

Relational Boosted Regression Trees

Sonia Cromp, Alireza Samadian, Kirk Pruhs

Many tasks use data housed in relational databases to train boosted regression tree models. In this paper, we give a relational adaptation of the greedy algorithm for training boosted regression trees. For the subproblem of calculating the sum of squared residuals of the dataset, which dominates the runtime of the boosting algorithm, we provide a $(1 + ε)$-approximation using the tensor sketch technique. Employing this approximation within the relational boosted regression trees algorithm leads to learning similar model parameters, but with asymptotically better runtime.

5.9DSAug 1, 2020

Relational Algorithms for k-means Clustering

Benjamin Moseley, Kirk Pruhs, Alireza Samadian et al.

This paper gives a k-means approximation algorithm that is efficient in the relational algorithms model. This is an algorithm that operates directly on a relational database without performing a join to convert it to a matrix whose rows represent the data points. The running time is potentially exponentially smaller than $N$, the number of data points to be clustered that the relational database represents. Few relational algorithms are known and this paper offers techniques for designing relational algorithms as well as characterizing their limitations. We show that given two data points as cluster centers, if we cluster points according to their closest centers, it is NP-Hard to approximate the number of points in the clusters on a general relational input. This is trivial for conventional data inputs and this result exemplifies that standard algorithmic techniques may not be directly applied when designing an efficient relational algorithm. This paper then introduces a new method that leverages rejection sampling and the $k$-means++ algorithm to construct an O(1)-approximate k-means solution.

5.1DSMay 11, 2020

A Relational Gradient Descent Algorithm For Support Vector Machine Training

Mahmoud Abo-Khamis, Sungjin Im, Benjamin Moseley et al.

We consider gradient descent like algorithms for Support Vector Machine (SVM) training when the data is in relational form. The gradient of the SVM objective can not be efficiently computed by known techniques as it suffers from the ``subtraction problem''. We first show that the subtraction problem can not be surmounted by showing that computing any constant approximation of the gradient of the SVM objective function is $\#P$-hard, even for acyclic joins. We, however, circumvent the subtraction problem by restricting our attention to stable instances, which intuitively are instances where a nearly optimal solution remains nearly optimal if the points are perturbed slightly. We give an efficient algorithm that computes a ``pseudo-gradient'' that guarantees convergence for stable instances at a rate comparable to that achieved by using the actual gradient. We believe that our results suggest that this sort of stability the analysis would likely yield useful insight in the context of designing algorithms on relational data for other learning problems in which the subtraction problem arises.

4.3DSMar 24, 2020

Approximate Aggregate Queries Under Additive Inequalities

Mahmoud Abo-Khamis, Sungjin Im, Benjamin Moseley et al.

We consider the problem of evaluating certain types of functional aggregation queries on relational data subject to additive inequalities. Such aggregation queries, with a smallish number of additive inequalities, arise naturally/commonly in many applications, particularly in learning applications. We give a relatively complete categorization of the computational complexity of such problems. We first show that the problem is NP-hard, even in the case of one additive inequality. Thus we turn to approximating the query. Our main result is an efficient algorithm for approximating, with arbitrarily small relative error, many natural aggregation queries with one additive inequality. We give examples of natural queries that can be efficiently solved using this algorithm. In contrast, we show that the situation with two additive inequalities is quite different, by showing that it is NP-hard to evaluate simple aggregation queries, with two additive inequalities, with any bounded relative error.

7.1LGMay 26, 2019

On Coresets for Regularized Loss Minimization

Ryan R. Curtin, Sungjin Im, Ben Moseley et al.

We design and mathematically analyze sampling-based algorithms for regularized loss minimization problems that are implementable in popular computational models for large data, in which the access to the data is restricted in some way. Our main result is that if the regularizer's effect does not become negligible as the norm of the hypothesis scales, and as the data scales, then a uniform sample of modest size is with high probability a coreset. In the case that the loss function is either logistic regression or soft-margin support vector machines, and the regularizer is one of the common recommended choices, this result implies that a uniform sample of size $O(d \sqrt{n})$ is with high probability a coreset of $n$ points in $\Re^d$. We contrast this upper bound with two lower bounds. The first lower bound shows that our analysis of uniform sampling is tight; that is, a smaller uniform sample will likely not be a core set. The second lower bound shows that in some sense uniform sampling is close to optimal, as significantly smaller core sets do not generally exist.