Mark D. Reid

6papers

325citations

Novelty59%

AI Score27

Ranked #158,417 of 201,326 authors (top 79%)#34,939 in LG (top 82%)

6 Papers

MLJun 10, 2016

Causal Bandits: Learning Good Interventions via Causal Inference

Finnian Lattimore, Tor Lattimore, Mark D. Reid

We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and causal inference to model a novel type of bandit feedback that is not exploited by existing approaches. We propose a new algorithm that exploits the causal feedback and prove a bound on its simple regret that is strictly better (in all quantities) than algorithms that do not use the additional causal information.

MLFeb 9, 2016

Compliance-Aware Bandits

Nicolás Della Penna, Mark D. Reid, David Balduzzi

Motivated by clinical trials, we study bandits with observable non-compliance. At each step, the learner chooses an arm, after, instead of observing only the reward, it also observes the action that took place. We show that such noncompliance can be helpful or hurtful to the learner in general. Unfortunately, naively incorporating compliance information into bandit algorithms loses guarantees on sublinear regret. We present hybrid algorithms that maintain regret bounds up to a multiplicative factor and can incorporate compliance information. Simulations based on real data from the International Stoke Trial show the practical potential of these algorithms.

LGJul 9, 2015

Fast rates in statistical and online learning

Tim van Erven, Peter D. Grünwald, Nishant A. Mehta et al.

The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.

GTOct 1, 2014

Risk Dynamics in Trade Networks

Rafael M. Frongillo, Mark D. Reid

We introduce a new framework to model interactions among agents which seek to trade to minimize their risk with respect to some future outcome. We quantify this risk using the concept of risk measures from finance, and introduce a class of trade dynamics which allow agents to trade contracts contingent upon the future outcome. We then show that these trade dynamics exactly correspond to a variant of randomized coordinate descent. By extending the analysis of these coordinate descent methods to account for our more organic setting, we are able to show convergence rates for very general trade dynamics, showing that the market or network converges to a unique steady state. Applying these results to prediction markets, we expand on recent results by adding convergence rates and general aggregation properties. Finally, we illustrate the generality of our framework by applying it to agent interactions on a scale-free network.

LGJun 24, 2014

Generalized Mixability via Entropic Duality

Mark D. Reid, Rafael M. Frongillo, Robert C. Williamson et al.

Mixability is a property of a loss which characterizes when fast convergence is possible in the game of prediction with expert advice. We show that a key property of mixability generalizes, and the exp and log operations present in the usual theory are not as special as one might have thought. In doing this we introduce a more general notion of $Φ$-mixability where $Φ$ is a general entropy (\ie, any convex function on probabilities). We show how a property shared by the convex dual of any such entropy yields a natural algorithm (the minimizer of a regret bound) which, analogous to the classical aggregating algorithm, is guaranteed a constant regret when used with $Φ$-mixable losses. We characterize precisely which $Φ$ have $Φ$-mixable losses and put forward a number of conjectures about the optimality and relationships between different choices of entropy.

LGMar 10, 2014

Generalised Mixability, Constant Regret, and Bayesian Updating

Mark D. Reid, Rafael M. Frongillo, Robert C. Williamson

Mixability of a loss is known to characterise when constant regret bounds are achievable in games of prediction with expert advice through the use of Vovk's aggregating algorithm. We provide a new interpretation of mixability via convex analysis that highlights the role of the Kullback-Leibler divergence in its definition. This naturally generalises to what we call $Φ$-mixability where the Bregman divergence $D_Φ$ replaces the KL divergence. We prove that losses that are $Φ$-mixable also enjoy constant regret bounds via a generalised aggregating algorithm that is similar to mirror descent.