Joseph K. Bradley

h-index13

4papers

2,074citations

Novelty44%

AI Score26

Ranked #162,692 of 194,257 authors (top 84%)#35,609 in LG (top 89%)

4 Papers

34.0LGMay 26, 2015

MLlib: Machine Learning in Apache Spark

Xiangrui Meng, Joseph Bradley, Burak Yavuz et al.

Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark's open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLlib supports several languages and provides a high-level API that leverages Spark's rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLlib has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed.

4.3ITAug 26, 2015

SPRIGHT: A Fast and Robust Framework for Sparse Walsh-Hadamard Transform

Xiao Li, Joseph K. Bradley, Sameer Pawar et al.

We consider the problem of computing the Walsh-Hadamard Transform (WHT) of some $N$-length input vector in the presence of noise, where the $N$-point Walsh spectrum is $K$-sparse with $K = {O}(N^δ)$ scaling sub-linearly in the input dimension $N$ for some $0<δ<1$. Over the past decade, there has been a resurgence in research related to the computation of Discrete Fourier Transform (DFT) for some length-$N$ input signal that has a $K$-sparse Fourier spectrum. In particular, through a sparse-graph code design, our earlier work on the Fast Fourier Aliasing-based Sparse Transform (FFAST) algorithm computes the $K$-sparse DFT in time ${O}(K\log K)$ by taking ${O}(K)$ noiseless samples. Inspired by the coding-theoretic design framework, Scheibler et al. proposed the Sparse Fast Hadamard Transform (SparseFHT) algorithm that elegantly computes the $K$-sparse WHT in the absence of noise using ${O}(K\log N)$ samples in time ${O}(K\log^2 N)$. However, the SparseFHT algorithm explicitly exploits the noiseless nature of the problem, and is not equipped to deal with scenarios where the observations are corrupted by noise. Therefore, a question of critical interest is whether this coding-theoretic framework can be made robust to noise. Further, if the answer is yes, what is the extra price that needs to be paid for being robust to noise? In this paper, we show, quite interestingly, that there is {\it no extra price} that needs to be paid for being robust to noise other than a constant factor. In other words, we can maintain the same sample complexity ${O}(K\log N)$ and the computational complexity ${O}(K\log^2 N)$ as those of the noiseless case, using our SParse Robust Iterative Graph-based Hadamard Transform (SPRIGHT) algorithm.

23.1LGMay 6, 2015

Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence

Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley et al.

Data in the form of pairwise comparisons arises in many domains, including preference elicitation, sporting competitions, and peer grading among others. We consider parametric ordinal models for such pairwise comparison data involving a latent vector $w^* \in \mathbb{R}^d$ that represents the "qualities" of the $d$ items being compared; this class of models includes the two most widely used parametric models--the Bradley-Terry-Luce (BTL) and the Thurstone models. Working within a standard minimax framework, we provide tight upper and lower bounds on the optimal error in estimating the quality score vector $w^*$ under this class of models. The bounds depend on the topology of the comparison graph induced by the subset of pairs being compared via its Laplacian spectrum. Thus, in settings where the subset of pairs may be chosen, our results provide principled guidelines for making this choice. Finally, we compare these error rates to those under cardinal measurement models and show that the error rates in the ordinal and cardinal settings have identical scalings apart from constant pre-factors.

10.7MLJun 25, 2014

When is it Better to Compare than to Score?

Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley et al.

When eliciting judgements from humans for an unknown quantity, one often has the choice of making direct-scoring (cardinal) or comparative (ordinal) measurements. In this paper we study the relative merits of either choice, providing empirical and theoretical guidelines for the selection of a measurement scheme. We provide empirical evidence based on experiments on Amazon Mechanical Turk that in a variety of tasks, (pairwise-comparative) ordinal measurements have lower per sample noise and are typically faster to elicit than cardinal ones. Ordinal measurements however typically provide less information. We then consider the popular Thurstone and Bradley-Terry-Luce (BTL) models for ordinal measurements and characterize the minimax error rates for estimating the unknown quantity. We compare these minimax error rates to those under cardinal measurement models and quantify for what noise levels ordinal measurements are better. Finally, we revisit the data collected from our experiments and show that fitting these models confirms this prediction: for tasks where the noise in ordinal measurements is sufficiently low, the ordinal approach results in smaller errors in the estimation.