Mahdi Milani Fard

h-index8

8papers

188citations

Novelty56%

AI Score28

Ranked #150,361 of 194,257 authors (top 77%)#33,075 in LG (top 82%)

8 Papers

5.3MLFeb 4, 2022

Distribution Embedding Networks for Generalization from a Diverse Set of Classification Tasks

Lang Liu, Mahdi Milani Fard, Sen Zhao

We propose Distribution Embedding Networks (DEN) for classification with small data. In the same spirit of meta-learning, DEN learns from a diverse set of training tasks with the goal to generalize to unseen target tasks. Unlike existing approaches which require the inputs of training and target tasks to have the same dimension with possibly similar distributions, DEN allows training and target tasks to live in heterogeneous input spaces. This is especially useful for tabular-data tasks where labeled data from related tasks are scarce. DEN uses a three-block architecture: a covariate transformation block followed by a distribution embedding block and then a classification block. We provide theoretical insights to show that this architecture allows the embedding and classification blocks to be fixed after pre-training on a diverse set of tasks; only the covariate transformation block with relatively few parameters needs to be fine-tuned for each new task. To facilitate training, we also propose an approach to synthesize binary classification tasks, and demonstrate that DEN outperforms existing methods in a number of synthetic and real tasks in numerical studies.

8.4LGFeb 18, 2021Code

Optimizing Black-box Metrics with Iterative Example Weighting

Gaurush Hiranandani, Jatin Mathur, Harikrishna Narasimhan et al.

We consider learning to optimize a classification metric defined by a black-box function of the confusion matrix. Such black-box learning settings are ubiquitous, for example, when the learner only has query access to the metric of interest, or in noisy-label and domain adaptation applications where the learner must evaluate the metric via performance evaluation using a small validation sample. Our approach is to adaptively learn example weights on the training dataset such that the resulting weighted objective best approximates the metric on the validation sample. We show how to model and estimate the example weights and use them to iteratively post-shift a pre-trained class probability estimator to construct a classifier. We also analyze the resulting procedure's statistical properties. Experiments on various label noise, domain shift, and fair classification setups confirm that our proposal compares favorably to the state-of-the-art baselines for each application.

6.5LGFeb 20, 2020

Optimizing Black-box Metrics with Adaptive Surrogates

Qijia Jiang, Olaoluwa Adigun, Harikrishna Narasimhan et al.

We address the problem of training models with black-box and hard-to-optimize metrics by expressing the metric as a monotonic function of a small number of easy-to-optimize surrogates. We pose the training problem as an optimization over a relaxed surrogate space, which we solve by estimating local gradients for the metric and performing inexact convex projections. We analyze gradient estimates based on finite differences and local linear interpolations, and show convergence of our approach under smoothness assumptions with respect to the surrogates. Experimental results on classification and ranking problems verify the proposal performs on par with methods that know the mathematical formulation, and adds notable value when the form of the metric is unknown.

18.4LGJun 28, 2018

Proxy Fairness

Maya Gupta, Andrew Cotter, Mahdi Milani Fard et al.

We consider the problem of improving fairness when one lacks access to a dataset labeled with protected groups, making it difficult to take advantage of strategies that can improve fairness but require protected group labels, either at training or runtime. To address this, we investigate improving fairness metrics for proxy groups, and test whether doing so results in improved fairness for the true sensitive groups. Results on benchmark and real-world datasets demonstrate that such a proxy fairness strategy can work well in practice. However, we caution that the effectiveness likely depends on the choice of fairness metric, as well as how aligned the proxy groups are with the true protected groups in terms of the constrained model parameters.

11.6MLMay 27, 2018

Metric-Optimized Example Weights

Sen Zhao, Mahdi Milani Fard, Harikrishna Narasimhan et al.

Real-world machine learning applications often have complex test metrics, and may have training and test data that are not identically distributed. Motivated by known connections between complex test metrics and cost-weighted learning, we propose addressing these issues by using a weighted loss function with a standard loss, where the weights on the training examples are learned to optimize the test metric on a validation set. These metric-optimized example weights can be learned for any test metric, including black box and customized ones for specific applications. We illustrate the performance of the proposed method on diverse public benchmark datasets and real-world applications. We also provide a generalization bound for the method.

15.9LGDec 1, 2013

Efficient Learning and Planning with Compressed Predictive States

William L. Hamilton, Mahdi Milani Fard, Joelle Pineau

Predictive state representations (PSRs) offer an expressive framework for modelling partially observable systems. By compactly representing systems as functions of observable quantities, the PSR learning approach avoids using local-minima prone expectation-maximization and instead employs a globally optimal moment-based algorithm. Moreover, since PSRs do not require a predetermined latent state structure as an input, they offer an attractive framework for model-based reinforcement learning when agents must plan without a priori access to a system model. Unfortunately, the expressiveness of PSRs comes with significant computational cost, and this cost is a major factor inhibiting the use of PSRs in applications. In order to alleviate this shortcoming, we introduce the notion of compressed PSRs (CPSRs). The CPSR learning approach combines recent advancements in dimensionality reduction, incremental matrix decomposition, and compressed sensing. We show how this approach provides a principled avenue for learning accurate approximations of PSRs, drastically reducing the computational costs associated with learning while also providing effective regularization. Going further, we propose a planning framework which exploits these learned models. And we show that this approach facilitates model-learning and planning in large complex partially observable domains, a task that is infeasible without the principled use of compression.

3.8LGJul 23, 2012

Bellman Error Based Feature Generation using Random Projections on Sparse Spaces

Mahdi Milani Fard, Yuri Grinberg, Amir-massoud Farahmand et al.

We address the problem of automatic generation of features for value function approximation. Bellman Error Basis Functions (BEBFs) have been shown to improve the error of policy evaluation with function approximation, with a convergence rate similar to that of value iteration. We propose a simple, fast and robust algorithm based on random projections to generate BEBFs for sparse feature spaces. We provide a finite sample analysis of the proposed method, and prove that projections logarithmic in the dimension of the original space are enough to guarantee contraction in the error. Empirical results demonstrate the strength of this method.

10.8LGFeb 14, 2012

PAC-Bayesian Policy Evaluation for Reinforcement Learning

Mahdi MIlani Fard, Joelle Pineau, Csaba Szepesvari

Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first PAC-Bayesian bound for the batch reinforcement learning problem with function approximation. We show how this bound can be used to perform model-selection in a transfer learning scenario. Our empirical results confirm that PAC-Bayesian policy evaluation is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignore them when they are misleading.