Raed Al Kontar

LG
h-index34
25papers
288citations
Novelty55%
AI Score56

25 Papers

64.2LGApr 20Code
Collaborative Contextual Bayesian Optimization

Chih-Yu Chang, Qiyuan Chen, Tianhan Gao et al.

Discovering optimal designs through sequential data collection is essential in many real-world applications. While Bayesian Optimization (BO) has achieved remarkable success in this setting, growing attention has recently turned to context-specific optimal design, formalized as Contextual Bayesian Optimization (CBO). Unlike BO, CBO is inherently more challenging as it must approximate an entire mapping from the context space to its corresponding optimal design, requiring simultaneous exploration across contexts and exploitation within each. In many modern applications, such tasks arise across multiple potentially heterogeneous but related clients, where collaboration can significantly improve learning efficiency. We propose CCBO, Collaborative Contextual Bayesian Optimization, a unified framework enabling multiple clients to jointly perform CBO with controllable contexts, supporting both online collaboration and offline initialization from peers' historical beliefs, with an optional privacy-preserving communication mechanism. We establish sublinear regret guarantees and demonstrate, through extensive simulations and a real-world hot rolling application, that CCBO achieves substantial improvements over existing approaches even under client heterogeneity. The code to reproduce the results can be found at https://github.com/cchihyu/Collaborative-Contextual-Bayesian-Optimization

LGJul 17, 2022
Personalized PCA: Decoupling Shared and Unique Features

Naichen Shi, Raed Al Kontar

In this paper, we tackle a significant challenge in PCA: heterogeneity. When data are collected from different sources with heterogeneous trends while still sharing some congruency, it is critical to extract shared knowledge while retaining the unique features of each source. To this end, we propose personalized PCA (PerPCA), which uses mutually orthogonal global and local principal components to encode both unique and shared features. We show that, under mild conditions, both unique and shared features can be identified and recovered by a constrained optimization problem, even if the covariance matrices are immensely different. Also, we design a fully federated algorithm inspired by distributed Stiefel gradient descent to solve the problem. The algorithm introduces a new group of operations called generalized retractions to handle orthogonality constraints, and only requires global PCs to be shared across sources. We prove the linear convergence of the algorithm under suitable assumptions. Comprehensive numerical experiments highlight PerPCA's superior performance in feature extraction and prediction from heterogeneous datasets. As a systematic approach to decouple shared and unique features from heterogeneous datasets, PerPCA finds applications in several tasks, including video segmentation, topic extraction, and feature clustering.

APJun 15, 2022
Federated Data Analytics: A Study on Linear Models

Xubo Yue, Raed Al Kontar, Ana María Estrada Gómez

As edge devices become increasingly powerful, data analytics are gradually moving from a centralized to a decentralized regime where edge compute resources are exploited to process more of the data locally. This regime of analytics is coined as federated data analytics (FDA). In spite of the recent success stories of FDA, most literature focuses exclusively on deep neural networks. In this work, we take a step back to develop an FDA treatment for one of the most fundamental statistical models: linear regression. Our treatment is built upon hierarchical modeling that allows borrowing strength across multiple groups. To this end, we propose two federated hierarchical model structures that provide a shared representation across devices to facilitate information sharing. Notably, our proposed frameworks are capable of providing uncertainty quantification, variable selection, hypothesis testing and fast adaptation to new unseen data. We validate our methods on a range of real-life applications including condition monitoring for aircraft engines. The results show that our FDA treatment for linear models can serve as a competing benchmark model for future development of federated algorithms.

LGJun 25, 2023
Collaborative and Distributed Bayesian Optimization via Consensus: Showcasing the Power of Collaboration for Optimal Design

Xubo Yue, Raed Al Kontar, Albert S. Berahas et al.

Optimal design is a critical yet challenging task within many applications. This challenge arises from the need for extensive trial and error, often done through simulations or running field experiments. Fortunately, sequential optimal design, also referred to as Bayesian optimization when using surrogates with a Bayesian flavor, has played a key role in accelerating the design process through efficient sequential sampling strategies. However, a key opportunity exists nowadays. The increased connectivity of edge devices sets forth a new collaborative paradigm for Bayesian optimization. A paradigm whereby different clients collaboratively borrow strength from each other by effectively distributing their experimentation efforts to improve and fast-track their optimal design process. To this end, we bring the notion of consensus to Bayesian optimization, where clients agree (i.e., reach a consensus) on their next-to-sample designs. Our approach provides a generic and flexible framework that can incorporate different collaboration mechanisms. In lieu of this, we propose transitional collaborative mechanisms where clients initially rely more on each other to maneuver through the early stages with scant data, then, at the late stages, focus on their own objectives to get client-specific solutions. Theoretically, we show the sub-linear growth in regret for our proposed framework. Empirically, through simulated datasets and a real-world collaborative sensor design experiment, we show that our framework can effectively accelerate and improve the optimal design process and benefit all participants.

CVMar 16, 2022
A Continual Learning Framework for Adaptive Defect Classification and Inspection

Wenbo Sun, Raed Al Kontar, Judy Jin et al.

Machine-vision-based defect classification techniques have been widely adopted for automatic quality inspection in manufacturing processes. This article describes a general framework for classifying defects from high volume data batches with efficient inspection of unlabelled samples. The concept is to construct a detector to identify new defect types, send them to the inspection station for labelling, and dynamically update the classifier in an efficient manner that reduces both storage and computational needs imposed by data samples of previously observed batches. Both a simulation study on image classification and a case study on surface defect detection via 3D point clouds are performed to demonstrate the effectiveness of the proposed method.

LGSep 7, 2023
Personalized Tucker Decomposition: Modeling Commonality and Peculiarity on Tensor Data

Jiuyun Hu, Naichen Shi, Raed Al Kontar et al.

We propose personalized Tucker decomposition (perTucker) to address the limitations of traditional tensor decomposition methods in capturing heterogeneity across different datasets. perTucker decomposes tensor data into shared global components and personalized local components. We introduce a mode orthogonality assumption and develop a proximal gradient regularized block coordinate descent algorithm that is guaranteed to converge to a stationary point. By learning unique and common representations across datasets, we demonstrate perTucker's effectiveness in anomaly detection, client classification, and clustering through a simulation study and two case studies on solar flare detection and tonnage signal classification.

LGAug 24, 2022
Rethinking Cost-sensitive Classification in Deep Learning via Adversarial Data Augmentation

Qiyuan Chen, Raed Al Kontar, Maher Nouiehed et al.

Cost-sensitive classification is critical in applications where misclassification errors widely vary in cost. However, over-parameterization poses fundamental challenges to the cost-sensitive modeling of deep neural networks (DNNs). The ability of a DNN to fully interpolate a training dataset can render a DNN, evaluated purely on the training set, ineffective in distinguishing a cost-sensitive solution from its overall accuracy maximization counterpart. This necessitates rethinking cost-sensitive classification in DNNs. To address this challenge, this paper proposes a cost-sensitive adversarial data augmentation (CSADA) framework to make over-parameterized models cost-sensitive. The overarching idea is to generate targeted adversarial examples that push the decision boundary in cost-aware directions. These targeted adversarial samples are generated by maximizing the probability of critical misclassifications and used to train a model with more conservative decisions on costly pairs. Experiments on well-known datasets and a pharmacy medication image (PMI) dataset made publicly available show that our method can effectively minimize the overall cost and reduce critical errors, while achieving comparable performance in terms of overall accuracy.

LGOct 12, 2023
SEE-OoD: Supervised Exploration For Enhanced Out-of-Distribution Detection

Xiaoyang Song, Wenbo Sun, Maher Nouiehed et al.

Current techniques for Out-of-Distribution (OoD) detection predominantly rely on quantifying predictive uncertainty and incorporating model regularization during the training phase, using either real or synthetic OoD samples. However, methods that utilize real OoD samples lack exploration and are prone to overfit the OoD samples at hand. Whereas synthetic samples are often generated based on features extracted from training data, rendering them less effective when the training and OoD data are highly overlapped in the feature space. In this work, we propose a Wasserstein-score-based generative adversarial training scheme to enhance OoD detection accuracy, which, for the first time, performs data augmentation and exploration simultaneously under the supervision of limited OoD samples. Specifically, the generator explores OoD spaces and generates synthetic OoD samples using feedback from the discriminator, while the discriminator exploits both the observed and synthesized samples for OoD detection using a predefined Wasserstein score. We provide theoretical guarantees that the optimal solutions of our generative scheme are statistically achievable through adversarial training in empirical settings. We then demonstrate that the proposed method outperforms state-of-the-art techniques on various computer vision datasets and exhibits superior generalizability to unseen OoD data.

LGMay 20, 2025Code
LLINBO: Trustworthy LLM-in-the-Loop Bayesian Optimization

Chih-Yu Chang, Milad Azvar, Chinedum Okwudire et al.

Bayesian optimization (BO) is a sequential decision-making tool widely used for optimizing expensive black-box functions. Recently, Large Language Models (LLMs) have shown remarkable adaptability in low-data regimes, making them promising tools for black-box optimization by leveraging contextual knowledge to propose high-quality query points. However, relying solely on LLMs as optimization agents introduces risks due to their lack of explicit surrogate modeling and calibrated uncertainty, as well as their inherently opaque internal mechanisms. This structural opacity makes it difficult to characterize or control the exploration-exploitation trade-off, ultimately undermining theoretical tractability and reliability. To address this, we propose LLINBO: LLM-in-the-Loop BO, a hybrid framework for BO that combines LLMs with statistical surrogate experts (e.g., Gaussian Processes (GP)). The core philosophy is to leverage contextual reasoning strengths of LLMs for early exploration, while relying on principled statistical models to guide efficient exploitation. Specifically, we introduce three mechanisms that enable this collaboration and establish their theoretical guarantees. We end the paper with a real-life proof-of-concept in the context of 3D printing. The code to reproduce the results can be found at https://github.com/UMDataScienceLab/LLM-in-the-Loop-BO.

57.2LGMay 14
Language-Induced Priors for Domain Adaptation

Qiyuan Chen, Jiayu Zhou, Raed Al Kontar

Domain adaptation faces a fundamental paradox in the cold-start regime. When target data is scarce, statistical methods fail to distinguish relevant source domains from irrelevant ones, which often leads to negative transfer. In this paper, we address this challenge by leveraging expert textual descriptions of the target domain, a resource that is often available but overlooked. We propose a probabilistic framework that translates these semantic descriptions into a choice model, namely a Language-Induced Prior (LIP), that learns the preferences from a pretrained Large Language Model (LLM). The LIP is then integrated into an Expectation-Maximization algorithm to identify source relevance. Methodologically, this framework is compatible with any parametric model where a likelihood is available. It allows the LIP to guide the selection of sources when target signals are weak, while gradually refining these choices as samples accumulate. Theoretically, we prove that the estimator roughly matches an oracle cold-start MSE under a correct prior, while remaining asymptotically consistent regardless of the quality of the LIP. Empirically, we validated the framework on a descriptive (Gaussian estimation), a predictive (C-MAPSS dataset), and a prescriptive task (MuJoCo hopper).

CLJun 11, 2025Code
Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models

Haoyi Song, Ruihan Ji, Naichen Shi et al.

Large language models (LLMs) have transformed natural language processing, but their reliable deployment requires effective uncertainty quantification (UQ). Existing UQ methods are often heuristic and lack a probabilistic interpretation. This paper begins by providing a theoretical justification for the role of perturbations in UQ for LLMs. We then introduce a dual random walk perspective, modeling input-output pairs as two Markov chains with transition probabilities defined by semantic similarity. Building on this, we propose a fully probabilistic framework based on an inverse model, which quantifies uncertainty by evaluating the diversity of the input space conditioned on a given output through systematic perturbations. Within this framework, we define a new uncertainty measure, Inv-Entropy. A key strength of our framework is its flexibility: it supports various definitions of uncertainty measures, embeddings, perturbation strategies, and similarity metrics. We also propose GAAP, a perturbation algorithm based on genetic algorithms, which enhances the diversity of sampled inputs. In addition, we introduce a new evaluation metric, Temperature Sensitivity of Uncertainty (TSU), which directly assesses uncertainty without relying on correctness as a proxy. Extensive experiments demonstrate that Inv-Entropy outperforms existing semantic UQ methods. The code to reproduce the results can be found at https://github.com/UMDataScienceLab/Uncertainty-Quantification-for-LLMs.

LGMar 25, 2024
Real-time Adaptation for Condition Monitoring Signal Prediction using Label-aware Neural Processes

Seokhyun Chung, Raed Al Kontar

Building a predictive model that rapidly adapts to real-time condition monitoring (CM) signals is critical for engineering systems/units. Unfortunately, many current methods suffer from a trade-off between representation power and agility in online settings. For instance, parametric methods that assume an underlying functional form for CM signals facilitate efficient online prediction updates. However, this simplification leads to vulnerability to model specifications and an inability to capture complex signals. On the other hand, approaches based on over-parameterized or non-parametric models can excel at explaining complex nonlinear signals, but real-time updates for such models pose a challenging task. In this paper, we propose a neural process-based approach that addresses this trade-off. It encodes available observations within a CM signal into a representation space and then reconstructs the signal's history and evolution for prediction. Once trained, the model can encode an arbitrary number of observations without requiring retraining, enabling on-the-spot real-time predictions along with quantified uncertainty and can be readily updated as more online data is gathered. Furthermore, our model is designed to incorporate qualitative information (i.e., labels) from individual units. This integration not only enhances individualized predictions for each unit but also enables joint inference for both signals and their associated labels. Numerical studies on both synthetic and real-world data in reliability engineering highlight the advantageous features of our model in real-time adaptation, enhanced signal prediction with uncertainty quantification, and joint prediction for labels and signals.

LGNov 12, 2024
Collaborative and Federated Black-box Optimization: A Bayesian Optimization Perspective

Raed Al Kontar

We focus on collaborative and federated black-box optimization (BBOpt), where agents optimize their heterogeneous black-box functions through collaborative sequential experimentation. From a Bayesian optimization perspective, we address the fundamental challenges of distributed experimentation, heterogeneity, and privacy within BBOpt, and propose three unifying frameworks to tackle these issues: (i) a global framework where experiments are centrally coordinated, (ii) a local framework that allows agents to make decisions based on minimal shared information, and (iii) a predictive framework that enhances local surrogates through collaboration to improve decision-making. We categorize existing methods within these frameworks and highlight key open questions to unlock the full potential of federated BBOpt. Our overarching goal is to shift federated learning from its predominantly descriptive/predictive paradigm to a prescriptive one, particularly in the context of BBOpt - an inherently sequential decision-making problem.

MEOct 18, 2024
The Traveling Bandit: A Framework for Bayesian Optimization with Movement Costs

Qiyuan Chen, Raed Al Kontar

This paper introduces a framework for Bayesian Optimization (BO) with metric movement costs, addressing a critical challenge in practical applications where input alterations incur varying costs. Our approach is a convenient plug-in that seamlessly integrates with the existing literature on batched algorithms, where designs within batches are observed following the solution of a Traveling Salesman Problem. The proposed method provides a theoretical guarantee of convergence in terms of movement costs for BO. Empirically, our method effectively reduces average movement costs over time while maintaining comparable regret performance to conventional BO methods. This framework also shows promise for broader applications in various bandit settings with movement costs.

LGMar 21, 2024
Triple Component Matrix Factorization: Untangling Global, Local, and Noisy Components

Naichen Shi, Salar Fattahi, Raed Al Kontar

In this work, we study the problem of common and unique feature extraction from noisy data. When we have N observation matrices from N different and associated sources corrupted by sparse and potentially gross noise, can we recover the common and unique components from these noisy observations? This is a challenging task as the number of parameters to estimate is approximately thrice the number of observations. Despite the difficulty, we propose an intuitive alternating minimization algorithm called triple component matrix factorization (TCMF) to recover the three components exactly. TCMF is distinguished from existing works in literature thanks to two salient features. First, TCMF is a principled method to separate the three components given noisy observations provably. Second, the bulk of the computation in TCMF can be distributed. On the technical side, we formulate the problem as a constrained nonconvex nonsmooth optimization problem. Despite the intricate nature of the problem, we provide a Taylor series characterization of its solution by solving the corresponding Karush-Kuhn-Tucker conditions. Using this characterization, we can show that the alternating minimization algorithm makes significant progress at each iteration and converges into the ground truth at a linear rate. Numerical experiments in video segmentation and anomaly detection highlight the superior feature extraction abilities of TCMF.

LGSep 3, 2025
Online Learning of Optimal Sequential Testing Policies

Qiyuan Chen, Raed Al Kontar

This paper studies an online learning problem that seeks optimal testing policies for a stream of subjects, each of whom can be evaluated through a sequence of candidate tests drawn from a common pool. We refer to this problem as the Online Testing Problem (OTP). Although conducting every candidate test for a subject provides more information, it is often preferable to select only a subset when tests are correlated and costly, and make decisions with partial information. If the joint distribution of test outcomes were known, the problem could be cast as a Markov Decision Process (MDP) and solved exactly. In practice, this distribution is unknown and must be learned online as subjects are tested. When a subject is not fully tested, the resulting missing data can bias estimates, making the problem fundamentally harder than standard episodic MDPs. We prove that the minimax regret must scale at least as $Ω(T^{\frac{2}{3}})$, in contrast to the $Θ(\sqrt{T})$ rate in episodic MDPs, revealing the difficulty introduced by missingness. This elevated lower bound is then matched by an Explore-Then-Commit algorithm whose cumulative regret is $\tilde{O}(T^{\frac{2}{3}})$ for both discrete and Gaussian distributions. To highlight the consequence of missingness-dependent rewards in OTP, we study a variant called the Online Cost-sensitive Maximum Entropy Sampling Problem, where rewards are independent of missing data. This structure enables an iterative-elimination algorithm that achieves $\tilde{O}(\sqrt{T})$ regret, breaking the $Ω(T^{\frac{2}{3}})$ lower bound for OTP. Numerical results confirm our theory in both settings. Overall, this work deepens the understanding of the exploration--exploitation trade-off under missing data and guides the design of efficient sequential testing policies.

LGJun 13, 2025
A Collaborative Process Parameter Recommender System for Fleets of Networked Manufacturing Machines -- with Application to 3D Printing

Weishi Wang, Sicong Guo, Chenhuan Jiang et al.

Fleets of networked manufacturing machines of the same type, that are collocated or geographically distributed, are growing in popularity. An excellent example is the rise of 3D printing farms, which consist of multiple networked 3D printers operating in parallel, enabling faster production and efficient mass customization. However, optimizing process parameters across a fleet of manufacturing machines, even of the same type, remains a challenge due to machine-to-machine variability. Traditional trial-and-error approaches are inefficient, requiring extensive testing to determine optimal process parameters for an entire fleet. In this work, we introduce a machine learning-based collaborative recommender system that optimizes process parameters for each machine in a fleet by modeling the problem as a sequential matrix completion task. Our approach leverages spectral clustering and alternating least squares to iteratively refine parameter predictions, enabling real-time collaboration among the machines in a fleet while minimizing the number of experimental trials. We validate our method using a mini 3D printing farm consisting of ten 3D printers for which we optimize acceleration and speed settings to maximize print quality and productivity. Our approach achieves significantly faster convergence to optimal process parameters compared to non-collaborative matrix completion.

LGMay 24, 2023
Personalized Dictionary Learning for Heterogeneous Datasets

Geyu Liang, Naichen Shi, Raed Al Kontar et al.

We introduce a relevant yet challenging problem named Personalized Dictionary Learning (PerDL), where the goal is to learn sparse linear representations from heterogeneous datasets that share some commonality. In PerDL, we model each dataset's shared and unique features as global and local dictionaries. Challenges for PerDL not only are inherited from classical dictionary learning (DL), but also arise due to the unknown nature of the shared and unique features. In this paper, we rigorously formulate this problem and provide conditions under which the global and local dictionaries can be provably disentangled. Under these conditions, we provide a meta-algorithm called Personalized Matching and Averaging (PerMA) that can recover both global and local dictionaries from heterogeneous datasets. PerMA is highly efficient; it converges to the ground truth at a linear rate under suitable conditions. Moreover, it automatically borrows strength from strong learners to improve the prediction of weak learners. As a general framework for extracting global and local dictionaries, we show the application of PerDL in different learning tasks, such as training with imbalanced datasets and video surveillance.

MLNov 28, 2021
Federated Gaussian Process: Convergence, Automatic Personalization and Multi-fidelity Modeling

Xubo Yue, Raed Al Kontar

In this paper, we propose \texttt{FGPR}: a Federated Gaussian process ($\mathcal{GP}$) regression framework that uses an averaging strategy for model aggregation and stochastic gradient descent for local client computations. Notably, the resulting global model excels in personalization as \texttt{FGPR} jointly learns a global $\mathcal{GP}$ prior across all clients. The predictive posterior then is obtained by exploiting this prior and conditioning on local data which encodes personalized features from a specific client. Theoretically, we show that \texttt{FGPR} converges to a critical point of the full log-likelihood function, subject to statistical error. Through extensive case studies we show that \texttt{FGPR} excels in a wide range of applications and is a promising approach for privacy-preserving multi-fidelity data modeling.

MLNov 19, 2021
Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits

Hao Chen, Lili Zheng, Raed Al Kontar et al.

Stochastic gradient descent (SGD) and its variants have established themselves as the go-to algorithms for large-scale machine learning problems with independent samples due to their generalization performance and intrinsic computational advantage. However, the fact that the stochastic gradient is a biased estimator of the full gradient with correlated samples has led to the lack of theoretical understanding of how SGD behaves under correlated settings and hindered its use in such cases. In this paper, we focus on hyperparameter estimation for the Gaussian process (GP) and take a step forward towards breaking the barrier by proving minibatch SGD converges to a critical point of the full log-likelihood loss function, and recovers model hyperparameters with rate $O(\frac{1}{K})$ for $K$ iterations, up to a statistical error term depending on the minibatch size. Our theoretical guarantees hold provided that the kernel functions exhibit exponential or polynomial eigendecay which is satisfied by a wide range of kernels commonly used in GPs. Numerical studies on both simulated and real datasets demonstrate that minibatch SGD has better generalization over state-of-the-art GP methods while reducing the computational burden and opening a new, previously unexplored, data size regime for GPs.

LGAug 5, 2021
GIFAIR-FL: A Framework for Group and Individual Fairness in Federated Learning

Xubo Yue, Maher Nouiehed, Raed Al Kontar

In this paper we propose \texttt{GIFAIR-FL}: a framework that imposes \textbf{G}roup and \textbf{I}ndividual \textbf{FAIR}ness to \textbf{F}ederated \textbf{L}earning settings. By adding a regularization term, our algorithm penalizes the spread in the loss of client groups to drive the optimizer to fair solutions. Our framework \texttt{GIFAIR-FL} can accommodate both global and personalized settings. Theoretically, we show convergence in non-convex and strongly convex settings. Our convergence guarantees hold for both $i.i.d.$ and non-$i.i.d.$ data. To demonstrate the empirical performance of our algorithm, we apply our method to image classification and text prediction tasks. Compared to existing algorithms, our method shows improved fairness results while retaining superior or similar prediction accuracy.

MLJul 21, 2021
Fed-ensemble: Improving Generalization through Model Ensembling in Federated Learning

Naichen Shi, Fan Lai, Raed Al Kontar et al.

In this paper we propose Fed-ensemble: a simple approach that bringsmodel ensembling to federated learning (FL). Instead of aggregating localmodels to update a single global model, Fed-ensemble uses random permutations to update a group of K models and then obtains predictions through model averaging. Fed-ensemble can be readily utilized within established FL methods and does not impose a computational overhead as it only requires one of the K models to be sent to a client in each communication round. Theoretically, we show that predictions on newdata from all K models belong to the same predictive posterior distribution under a neural tangent kernel regime. This result in turn sheds light onthe generalization advantages of model averaging. We also illustrate thatFed-ensemble has an elegant Bayesian interpretation. Empirical results show that our model has superior performance over several FL algorithms,on a wide range of data sets, and excels in heterogeneous settings often encountered in FL applications.

LGNov 10, 2020
SALR: Sharpness-aware Learning Rate Scheduler for Improved Generalization

Xubo Yue, Maher Nouiehed, Raed Al Kontar

In an effort to improve generalization in deep learning and automate the process of learning rate scheduling, we propose SALR: a sharpness-aware learning rate update technique designed to recover flat minimizers. Our method dynamically updates the learning rate of gradient-based optimizers based on the local sharpness of the loss function. This allows optimizers to automatically increase learning rates at sharp valleys to increase the chance of escaping them. We demonstrate the effectiveness of SALR when adopted by various algorithms over a broad range of networks. Our experiments indicate that SALR improves generalization, converges faster, and drives solutions to significantly flatter regions.

MLFeb 19, 2020
Weakly-supervised Multi-output Regression via Correlated Gaussian Processes

Seokhyun Chung, Raed Al Kontar, Zhenke Wu

Multi-output regression seeks to borrow strength and leverage commonalities across different but related outputs in order to enhance learning and prediction accuracy. A fundamental assumption is that the output/group membership labels for all observations are known. This assumption is often violated in real applications. For instance, in healthcare datasets, sensitive attributes such as ethnicity are often missing or unreported. To this end, we introduce a weakly-supervised multi-output model based on dependent Gaussian processes. Our approach is able to leverage data without complete group labels or possibly only prior belief on group memberships to enhance accuracy across all outputs. Through intensive simulations and case studies on an Insulin, Testosterone and Bodyfat dataset, we show that our model excels in multi-output settings with missing labels, while being competitive in traditional fully labeled settings. We end by highlighting the possible use of our approach in fair inference and sequential decision-making.

LGNov 4, 2019
Why Non-myopic Bayesian Optimization is Promising and How Far Should We Look-ahead? A Study via Rollout

Xubo Yue, Raed Al Kontar

Lookahead, also known as non-myopic, Bayesian optimization (BO) aims to find optimal sampling policies through solving a dynamic program (DP) that maximizes a long-term reward over a rolling horizon. Though promising, lookahead BO faces the risk of error propagation through its increased dependence on a possibly mis-specified model. In this work we focus on the rollout approximation for solving the intractable DP. We first prove the improving nature of rollout in tackling lookahead BO and provide a sufficient condition for the used heuristic to be rollout improving. We then provide both a theoretical and practical guideline to decide on the rolling horizon stagewise. This guideline is built on quantifying the negative effect of a mis-specified model. To illustrate our idea, we provide case studies on both single and multi-information source BO. Empirical results show the advantageous properties of our method over several myopic and non-myopic BO algorithms.