LGMay 27
Revisiting ML Training under Fully Homomorphic Encryption: Convergence Guarantees, Differential Privacy, and Efficient AlgorithmsYvonne Zhou, Mingyu Liang, Ivan Brugere et al.
We present the first theoretical convergence analysis of machine learning training under fully homomorphic encryption (FHE), combined with a differentially private (DP) training algorithm tailored to encrypted computation. Our approach improves computational efficiency over standard differentially private gradient descent (DP-GD) while achieving comparable utility. In particular, we prove convergence of approximate gradient descent using polynomial approximations of activation and loss functions, which are required for FHE compatibility. To preserve privacy in downstream tasks, we integrate differential privacy without relying on costly per-sample gradient clipping, enabling scalable encrypted learning. We also provide data-independent hyperparameter selection and theoretically grounded strategies for polynomial approximation which can be of independent interest. Together, these contributions advance the feasibility of efficient, private, and secure machine learning on sensitive data.
LGMar 14, 2022
Optimal Admission Control for Multiclass Queues with Time-Varying Arrival Rates via State AbstractionMarc Rigter, Danial Dervovic, Parisa Hassanzadeh et al.
We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks into a no buffer queue which are processed by $N$ identical servers. Each task has a price, which is a positive real number, and a class. Each class of task has a different price distribution and service rate, and arrives according to an inhomogenous Poisson process. The objective is to decide which tasks to accept so that the total price of tasks processed is maximised over a finite horizon. We formulate the problem as a discrete time Markov Decision Process (MDP) with a hybrid state space. We show that the optimal value function has a specific structure, which enables us to solve the hybrid MDP exactly. Moreover, we prove that as the time step is reduced, the discrete time solution approaches the optimal solution to the original continuous time problem. To improve the scalability of our approach to a greater number of task classes, we present an approximation based on state abstraction. We validate our approach on synthetic data, as well as a real financial fraud data set, which is the motivating application for this work.
AIJul 13, 2023
On the Connection between Game-Theoretic Feature Attributions and Counterfactual ExplanationsEmanuele Albini, Shubham Sharma, Saumitra Mishra et al.
Explainable Artificial Intelligence (XAI) has received widespread interest in recent years, and two of the most popular types of explanations are feature attributions, and counterfactual explanations. These classes of approaches have been largely studied independently and the few attempts at reconciling them have been primarily empirical. This work establishes a clear theoretical connection between game-theoretic feature attributions, focusing on but not limited to SHAP, and counterfactuals explanations. After motivating operative changes to Shapley values based feature attributions and counterfactual explanations, we prove that, under conditions, they are in fact equivalent. We then extend the equivalency result to game-theoretic solution concepts beyond Shapley values. Moreover, through the analysis of the conditions of such equivalence, we shed light on the limitations of naively using counterfactual explanations to provide feature importances. Experiments on three datasets quantitatively show the difference in explanations at every stage of the connection between the two approaches and corroborate the theoretical findings.
LGNov 11, 2022
Rethinking Log Odds: Linear Probability Modelling and Expert Advice in Interpretable Machine LearningDanial Dervovic, Nicolas Marchesotti, Freddy Lecue et al.
We introduce a family of interpretable machine learning models, with two broad additions: Linearised Additive Models (LAMs) which replace the ubiquitous logistic link function in General Additive Models (GAMs); and SubscaleHedge, an expert advice algorithm for combining base models trained on subsets of features called subscales. LAMs can augment any additive binary classification model equipped with a sigmoid link function. Moreover, they afford direct global and local attributions of additive components to the model output in probability space. We argue that LAMs and SubscaleHedge improve the interpretability of their base algorithms. Using rigorous null-hypothesis significance testing on a broad suite of financial modelling data, we show that our algorithms do not suffer from large performance penalties in terms of ROC-AUC and calibration.
LGOct 23, 2023
A Canonical Data Transformation for Achieving Inter- and Within-group FairnessZachary McBride Lazri, Ivan Brugere, Xin Tian et al.
Increases in the deployment of machine learning algorithms for applications that deal with sensitive data have brought attention to the issue of fairness in machine learning. Many works have been devoted to applications that require different demographic groups to be treated fairly. However, algorithms that aim to satisfy inter-group fairness (also called group fairness) may inadvertently treat individuals within the same demographic group unfairly. To address this issue, we introduce a formal definition of within-group fairness that maintains fairness among individuals from within the same group. We propose a pre-processing framework to meet both inter- and within-group fairness criteria with little compromise in accuracy. The framework maps the feature vectors of members from different groups to an inter-group-fair canonical domain before feeding them into a scoring function. The mapping is constructed to preserve the relative relationship between the scores obtained from the unprocessed feature vectors of individuals from the same demographic group, guaranteeing within-group fairness. We apply this framework to the COMPAS risk assessment and Law School datasets and compare its performance in achieving inter-group and within-group fairness to two regularization-based methods.
MLSep 22, 2022
Optimal Stopping with Gaussian ProcessesKshama Dwarakanath, Danial Dervovic, Peyman Tavallali et al.
We propose a novel group of Gaussian Process based algorithms for fast approximate optimal stopping of time series with specific applications to financial markets. We show that structural properties commonly exhibited by financial time series (e.g., the tendency to mean-revert) allow the use of Gaussian and Deep Gaussian Process models that further enable us to analytically evaluate optimal stopping value functions and policies. We additionally quantify uncertainty in the value function by propagating the price model through the optimal stopping analysis. We compare and contrast our proposed methods against a sampling-based method, as well as a deep learning based benchmark that is currently considered the state-of-the-art in the literature. We show that our family of algorithms outperforms benchmarks on three historical time series datasets that include intra-day and end-of-day equity stock prices as well as the daily US treasury yield curve rates.
CLApr 9, 2024Code
Characterizing Multimodal Long-form Summarization: A Case Study on Financial ReportsTianyu Cao, Natraj Raman, Danial Dervovic et al.
As large language models (LLMs) expand the power of natural language processing to handle long inputs, rigorous and systematic analyses are necessary to understand their abilities and behavior. A salient application is summarization, due to its ubiquity and controversy (e.g., researchers have declared the death of summarization). In this paper, we use financial report summarization as a case study because financial reports are not only long but also use numbers and tables extensively. We propose a computational framework for characterizing multimodal long-form summarization and investigate the behavior of Claude 2.0/2.1, GPT-4/3.5, and Cohere. We find that GPT-3.5 and Cohere fail to perform this summarization task meaningfully. For Claude 2 and GPT-4, we analyze the extractiveness of the summary and identify a position bias in LLMs. This position bias disappears after shuffling the input for Claude, which suggests that Claude seems to recognize important information. We also conduct a comprehensive investigation on the use of numeric data in LLM-generated summaries and offer a taxonomy of numeric hallucination. We employ prompt engineering to improve GPT-4's use of numbers with limited success. Overall, our analyses highlight the strong capability of Claude 2 in handling long multimodal inputs compared to GPT-4. The generated summaries and evaluation code are available at https://github.com/ChicagoHAI/characterizing-multimodal-long-form-summarization.
AIMar 10
GenePlan: Evolving Better Generalized PDDL Plans using Large Language ModelsAndrew Murray, Danial Dervovic, Alberto Pozanco et al.
We present GenePlan (GENeralized Evolutionary Planner), a novel framework that leverages large language model (LLM) assisted evolutionary algorithms to generate domain-dependent generalized planners for classical planning tasks described in PDDL. By casting generalized planning as an optimization problem, GenePlan iteratively evolves interpretable Python planners that minimize plan length across diverse problem instances. In empirical evaluation across six existing benchmark domains and two new domains, GenePlan achieved an average SAT score of 0.91, closely matching the performance of the state-of-the-art planners (SAT score 0.93), and significantly outperforming other LLM-based baselines such as chain-of-thought (CoT) prompting (average SAT score 0.64). The generated planners solve new instances rapidly (average 0.49 seconds per task) and at low cost (average $1.82 per domain using GPT-4o).
LGFeb 6, 2024
Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic DataYvonne Zhou, Mingyu Liang, Ivan Brugere et al.
The growing use of machine learning (ML) has raised concerns that an ML model may reveal private information about an individual who has contributed to the training dataset. To prevent leakage of sensitive data, we consider using differentially-private (DP), synthetic training data instead of real training data to train an ML model. A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution. Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data, for continuous and Lipschitz loss functions. We perform extensive experimentation alongside our theoretical results.
LGAug 20, 2025
ELATE: Evolutionary Language model for Automated Time-series EngineeringAndrew Murray, Danial Dervovic, Michael Cashmore
Time-series prediction involves forecasting future values using machine learning models. Feature engineering, whereby existing features are transformed to make new ones, is critical for enhancing model performance, but is often manual and time-intensive. Existing automation attempts rely on exhaustive enumeration, which can be computationally costly and lacks domain-specific insights. We introduce ELATE (Evolutionary Language model for Automated Time-series Engineering), which leverages a language model within an evolutionary framework to automate feature engineering for time-series data. ELATE employs time-series statistical measures and feature importance metrics to guide and prune features, while the language model proposes new, contextually relevant feature transformations. Our experiments demonstrate that ELATE improves forecasting accuracy by an average of 8.4% across various domains.
LGApr 25, 2025
Model Evaluation in the Dark: Robust Classifier Metrics with Missing LabelsDanial Dervovic, Michael Cashmore
Missing data in supervised learning is well-studied, but the specific issue of missing labels during model evaluation has been overlooked. Ignoring samples with missing values, a common solution, can introduce bias, especially when data is Missing Not At Random (MNAR). We propose a multiple imputation technique for evaluating classifiers using metrics such as precision, recall, and ROC-AUC. This method not only offers point estimates but also a predictive distribution for these quantities when labels are missing. We empirically show that the predictive distribution's location and shape are generally correct, even in the MNAR regime. Moreover, we establish that this distribution is approximately Gaussian and provide finite-sample convergence bounds. Additionally, a robustness proof is presented, confirming the validity of the approximation under a realistic error model.
MAFeb 25, 2025
MAFE: Multi-Agent Fair Environments for Decision-Making SystemsZachary McBride Lazri, Anirudh Nakra, Ivan Brugere et al.
Fairness constraints applied to machine learning (ML) models in static contexts have been shown to potentially produce adverse outcomes among demographic groups over time. To address this issue, emerging research focuses on creating fair solutions that persist over time. While many approaches treat this as a single-agent decision-making problem, real-world systems often consist of multiple interacting entities that influence outcomes. Explicitly modeling these entities as agents enables more flexible analysis of their interventions and the effects they have on a system's underlying dynamics. A significant challenge in conducting research on multi-agent systems is the lack of realistic environments that leverage the limited real-world data available for analysis. To address this gap, we introduce the concept of a Multi-Agent Fair Environment (MAFE) and present and analyze three MAFEs that model distinct social systems. Experimental results demonstrate the utility of our MAFEs as testbeds for developing multi-agent fair algorithms.
LGJun 19, 2024
Are Logistic Models Really Interpretable?Danial Dervovic, Freddy Lécué, Nicolás Marchesotti et al.
The demand for open and trustworthy AI models points towards widespread publishing of model weights. Consumers of these model weights must be able to act accordingly with the information provided. That said, one of the simplest AI classification models, Logistic Regression (LR), has an unwieldy interpretation of its model weights, with greater difficulties when extending LR to generalised additive models. In this work, we show via a User Study that skilled participants are unable to reliably reproduce the action of small LR models given the trained parameters. As an antidote to this, we define Linearised Additive Models (LAMs), an optimal piecewise linear approximation that augments any trained additive model equipped with a sigmoid link function, requiring no retraining. We argue that LAMs are more interpretable than logistic models -- survey participants are shown to solve model reasoning tasks with LAMs much more accurately than with LR given the same information. Furthermore, we show that LAMs do not suffer from large performance penalties in terms of ROC-AUC and calibration with respect to their logistic counterparts on a broad suite of public financial modelling data.
LGJun 4, 2024
Cross-Domain Graph Data Scaling: A Showcase with Diffusion ModelsWenzhuo Tang, Haitao Mao, Danial Dervovic et al.
Models for natural language and images benefit from data scaling behavior: the more data fed into the model, the better they perform. This 'better with more' phenomenon enables the effectiveness of large-scale pre-training on vast amounts of data. However, current graph pre-training methods struggle to scale up data due to heterogeneity across graphs. To achieve effective data scaling, we aim to develop a general model that is able to capture diverse data patterns of graphs and can be utilized to adaptively help the downstream tasks. To this end, we propose UniAug, a universal graph structure augmentor built on a diffusion model. We first pre-train a discrete diffusion model on thousands of graphs across domains to learn the graph structural patterns. In the downstream phase, we provide adaptive enhancement by conducting graph structure augmentation with the help of the pre-trained diffusion model via guided generation. By leveraging the pre-trained diffusion model for structure augmentation, we consistently achieve performance improvements across various downstream tasks in a plug-and-play manner. To the best of our knowledge, this study represents the first demonstration of a data-scaling graph structure augmentor on graphs across domains.
AIMar 14, 2024
Surrogate Assisted Monte Carlo Tree Search in Combinatorial OptimizationSaeid Amiri, Parisa Zehtabi, Danial Dervovic et al.
Industries frequently adjust their facilities network by opening new branches in promising areas and closing branches in areas where they expect low profits. In this paper, we examine a particular class of facility location problems. Our objective is to minimize the loss of sales resulting from the removal of several retail stores. However, estimating sales accurately is expensive and time-consuming. To overcome this challenge, we leverage Monte Carlo Tree Search (MCTS) assisted by a surrogate model that computes evaluations faster. Results suggest that MCTS supported by a fast surrogate function can generate solutions faster while maintaining a consistent solution compared to MCTS that does not benefit from the surrogate function.
LGMar 12, 2024
Balancing Fairness and Accuracy in Data-Restricted Binary ClassificationZachary McBride Lazri, Danial Dervovic, Antigoni Polychroniadou et al.
Applications that deal with sensitive information may have restrictions placed on the data available to a machine learning (ML) classifier. For example, in some applications, a classifier may not have direct access to sensitive attributes, affecting its ability to produce accurate and fair decisions. This paper proposes a framework that models the trade-off between accuracy and fairness under four practical scenarios that dictate the type of data available for analysis. Prior works examine this trade-off by analyzing the outputs of a scoring function that has been trained to implicitly learn the underlying distribution of the feature vector, class label, and sensitive attribute of a dataset. In contrast, our framework directly analyzes the behavior of the optimal Bayesian classifier on this underlying distribution by constructing a discrete approximation it from the dataset itself. This approach enables us to formulate multiple convex optimization problems, which allow us to answer the question: How is the accuracy of a Bayesian classifier affected in different data restricting scenarios when constrained to be fair? Analysis is performed on a set of fairness definitions that include group and individual fairness. Experiments on three datasets demonstrate the utility of the proposed framework as a tool for quantifying the trade-offs among different fairness notions and their distributional dependencies.
LGOct 27, 2021
Counterfactual Shapley Additive ExplanationsEmanuele Albini, Jason Long, Danial Dervovic et al.
Feature attributions are a common paradigm for model explanations due to their simplicity in assigning a single numeric score for each input feature to a model. In the actionable recourse setting, wherein the goal of the explanations is to improve outcomes for model consumers, it is often unclear how feature attributions should be correctly used. With this work, we aim to strengthen and clarify the link between actionable recourse and feature attributions. Concretely, we propose a variant of SHAP, Counterfactual SHAP (CF-SHAP), that incorporates counterfactual information to produce a background dataset for use within the marginal (a.k.a. interventional) Shapley value framework. We motivate the need within the actionable recourse setting for careful consideration of background datasets when using Shapley values for feature attributions with numerous synthetic examples. Moreover, we demonstrate the efficacy of CF-SHAP by proposing and justifying a quantitative score for feature attributions, counterfactual-ability, showing that as measured by this metric, CF-SHAP is superior to existing methods when evaluated on public datasets using tree ensembles.
LGOct 5, 2021
Tradeoffs in Streaming Binary Classification under Limited Inspection ResourcesParisa Hassanzadeh, Danial Dervovic, Samuel Assefa et al.
Institutions are increasingly relying on machine learning models to identify and alert on abnormal events, such as fraud, cyber attacks and system failures. These alerts often need to be manually investigated by specialists. Given the operational cost of manual inspections, the suspicious events are selected by alerting systems with carefully designed thresholds. In this paper, we consider an imbalanced binary classification problem, where events arrive sequentially and only a limited number of suspicious events can be inspected. We model the event arrivals as a non-homogeneous Poisson process, and compare various suspicious event selection methods including those based on static and adaptive thresholds. For each method, we analytically characterize the tradeoff between the minority-class detection rate and the inspection capacity as a function of the data class imbalance and the classifier confidence score densities. We implement the selection methods on a real public fraud detection dataset and compare the empirical results with analytical bounds. Finally, we investigate how class imbalance and the choice of classifier impact the tradeoff.
LGJun 29, 2021
Counterfactual Explanations for Arbitrary Regression ModelsThomas Spooner, Danial Dervovic, Jason Long et al.
We present a new method for counterfactual explanations (CFEs) based on Bayesian optimisation that applies to both classification and regression models. Our method is a globally convergent search algorithm with support for arbitrary regression models and constraints like feature sparsity and actionable recourse, and furthermore can answer multiple counterfactual questions in parallel while learning from previous queries. We formulate CFE search for regression models in a rigorous mathematical framework using differentiable potentials, which resolves robustness issues in threshold-based objectives. We prove that in this framework, (a) verifying the existence of counterfactuals is NP-complete; and (b) that finding instances using such potentials is CLS-complete. We describe a unified algorithm for CFEs using a specialised acquisition function that composes both expected improvement and an exponential-polynomial (EP) family with desirable properties. Our evaluation on real-world benchmark domains demonstrate high sample-efficiency and precision.
AIJun 9, 2021
Non-Parametric Stochastic Sequential Assignment With Random Arrival TimesDanial Dervovic, Parisa Hassanzadeh, Samuel Assefa et al.
We consider a problem wherein jobs arrive at random times and assume random values. Upon each job arrival, the decision-maker must decide immediately whether or not to accept the job and gain the value on offer as a reward, with the constraint that they may only accept at most $n$ jobs over some reference time period. The decision-maker only has access to $M$ independent realisations of the job arrival process. We propose an algorithm, Non-Parametric Sequential Allocation (NPSA), for solving this problem. Moreover, we prove that the expected reward returned by the NPSA algorithm converges in probability to optimality as $M$ grows large. We demonstrate the effectiveness of the algorithm empirically on synthetic data and on public fraud-detection datasets, from where the motivation for this work is derived.
MEMay 27, 2021
Calibrating Over-Parametrized Simulation Models: A Framework via Eligibility SetYuanlu Bai, Tucker Balch, Haoxian Chen et al.
Stochastic simulation aims to compute output performance for complex models that lack analytical tractability. To ensure accurate prediction, the model needs to be calibrated and validated against real data. Conventional methods approach these tasks by assessing the model-data match via simple hypothesis tests or distance minimization in an ad hoc fashion, but they can encounter challenges arising from non-identifiability and high dimensionality. In this paper, we investigate a framework to develop calibration schemes that satisfy rigorous frequentist statistical guarantees, via a basic notion that we call eligibility set designed to bypass non-identifiability via a set-based estimation. We investigate a feature extraction-then-aggregation approach to construct these sets that target at multivariate outputs. We demonstrate our methodology on several numerical examples, including an application to calibration of a limit order book market simulator (ABIDES).