Arash Pourhabib

ML
3papers
8citations
Novelty60%
AI Score24

3 Papers

MLAug 5, 2015
Sparse Pseudo-input Local Kriging for Large Spatial Datasets with Exogenous Variables

Babak Farmanesh, Arash Pourhabib

We study large-scale spatial systems that contain exogenous variables, e.g. environmental factors that are significant predictors in spatial processes. Building predictive models for such processes is challenging because the large numbers of observations present makes it inefficient to apply full Kriging. In order to reduce computational complexity, this paper proposes Sparse Pseudo-input Local Kriging (SPLK), which utilizes hyperplanes to partition a domain into smaller subdomains and then applies a sparse approximation of the full Kriging to each subdomain. We also develop an optimization procedure to find the desired hyperplanes. To alleviate the problem of discontinuity in the global predictor, we impose continuity constraints on the boundaries of the neighboring subdomains. Furthermore, partitioning the domain into smaller subdomains makes it possible to use different parameter values for the covariance function in each region and, therefore, the heterogeneity in the data structure can be effectively captured. Numerical experiments demonstrate that SPLK outperforms, or is comparable to, the algorithms commonly applied to spatial datasets.

MLAug 5, 2015
A Bayesian framework for functional calibration of expensive computational models through non-isometric matching

Babak Farmanesh, Arash Pourhabib, Balabhaskar Balasundaram et al.

We study statistical calibration, i.e., adjusting features of a computational model that are not observable or controllable in its associated physical system. We focus on functional calibration, which arises in many manufacturing processes where the unobservable features, called calibration variables, are a function of the input variables. A major challenge in many applications is that computational models are expensive and can only be evaluated a limited number of times. Furthermore, without making strong assumptions, the calibration variables are not identifiable. We propose Bayesian non-isometric matching calibration (BNMC) that allows calibration of expensive computational models with only a limited number of samples taken from a computational model and its associated physical system. BNMC replaces the computational model with a dynamic Gaussian process (GP) whose parameters are trained in the calibration procedure. To resolve the identifiability issue, we present the calibration problem from a geometric perspective of non-isometric curve to surface matching, which enables us to take advantage of combinatorial optimization techniques to extract necessary information for constructing prior distributions. Our numerical experiments demonstrate that in terms of prediction accuracy BNMC outperforms, or is comparable to, other existing calibration frameworks.

MLAug 5, 2015
Empirical Similarity for Absent Data Generation in Imbalanced Classification

Arash Pourhabib

When the training data in a two-class classification problem is overwhelmed by one class, most classification techniques fail to correctly identify the data points belonging to the underrepresented class. We propose Similarity-based Imbalanced Classification (SBIC) that learns patterns in the training data based on an empirical similarity function. To take the imbalanced structure of the training data into account, SBIC utilizes the concept of absent data, i.e. data from the minority class which can help better find the boundary between the two classes. SBIC simultaneously optimizes the weights of the empirical similarity function and finds the locations of absent data points. As such, SBIC uses an embedded mechanism for synthetic data generation which does not modify the training dataset, but alters the algorithm to suit imbalanced datasets. Therefore, SBIC uses the ideas of both major schools of thoughts in imbalanced classification: Like cost-sensitive approaches SBIC operates on an algorithm level to handle imbalanced structures; and similar to synthetic data generation approaches, it utilizes the properties of unobserved data points from the minority class. The application of SBIC to imbalanced datasets suggests it is comparable to, and in some cases outperforms, other commonly used classification techniques for imbalanced datasets.