LGApr 28, 2023
Online Platt Scaling with CalibeatingChirag Gupta, Aaditya Ramdas
We present an online post-hoc calibration method, called Online Platt Scaling (OPS), which combines the Platt scaling technique with online logistic regression. We demonstrate that OPS smoothly adapts between i.i.d. and non-i.i.d. settings with distribution drift. Further, in scenarios where the best Platt scaling model is itself miscalibrated, we enhance OPS by incorporating a recently developed technique called calibeating to make it more robust. Theoretically, our resulting OPS+calibeating method is guaranteed to be calibrated for adversarial outcome sequences. Empirically, it is effective on a range of synthetic and real-world datasets, with and without distribution drifts, achieving superior performance without hyperparameter tuning. Finally, we extend all OPS ideas to the beta scaling method.
LGApr 27, 2022
Faster online calibration without randomization: interval forecasts and the power of two choicesChirag Gupta, Aaditya Ramdas
We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an $ε$-calibration error rate of $O(1/\sqrt{T})$, which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be $Ω(1)$. Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster $ε$-calibration rate of $O(1/T)$ can be achieved even without deploying any randomization.
SPAug 5, 2023
OrcoDCS: An IoT-Edge Orchestrated Online Deep Compressed Sensing FrameworkCheng-Wei Ching, Chirag Gupta, Zi Huang et al.
Compressed data aggregation (CDA) over wireless sensor networks (WSNs) is task-specific and subject to environmental changes. However, the existing compressed data aggregation (CDA) frameworks (e.g., compressed sensing-based data aggregation, deep learning(DL)-based data aggregation) do not possess the flexibility and adaptivity required to handle distinct sensing tasks and environmental changes. Additionally, they do not consider the performance of follow-up IoT data-driven deep learning (DL)-based applications. To address these shortcomings, we propose OrcoDCS, an IoT-Edge orchestrated online deep compressed sensing framework that offers high flexibility and adaptability to distinct IoT device groups and their sensing tasks, as well as high performance for follow-up applications. The novelty of our work is the design and deployment of IoT-Edge orchestrated online training framework over WSNs by leveraging an specially-designed asymmetric autoencoder, which can largely reduce the encoding overhead and improve the reconstruction performance and robustness. We show analytically and empirically that OrcoDCS outperforms the state-of-the-art DCDA on training time, significantly improves flexibility and adaptability when distinct reconstruction tasks are given, and achieves higher performance for follow-up applications.
LGJul 18, 2021Code
Top-label calibration and multiclass-to-binary reductionsChirag Gupta, Aaditya Ramdas
A multiclass classifier is said to be top-label calibrated if the reported probability for the predicted class -- the top-label -- is calibrated, conditioned on the top-label. This conditioning on the top-label is absent in the closely related and popular notion of confidence calibration, which we argue makes confidence calibration difficult to interpret for decision-making. We propose top-label calibration as a rectification of confidence calibration. Further, we outline a multiclass-to-binary (M2B) reduction framework that unifies confidence, top-label, and class-wise calibration, among others. As its name suggests, M2B works by reducing multiclass calibration to numerous binary calibration problems, each of which can be solved using simple binary calibration routines. We instantiate the M2B framework with the well-studied histogram binning (HB) binary calibrator, and prove that the overall procedure is multiclass calibrated without making any assumptions on the underlying data distribution. In an empirical evaluation with four deep net architectures on CIFAR-10 and CIFAR-100, we find that the M2B + HB procedure achieves lower top-label and class-wise calibration error than other approaches such as temperature scaling. Code for this work is available at \url{https://github.com/aigen/df-posthoc-calibration}.
MEMay 10, 2021Code
Distribution-free calibration guarantees for histogram binning without sample splittingChirag Gupta, Aaditya K. Ramdas
We prove calibration guarantees for the popular histogram binning (also called uniform-mass binning) method of Zadrozny and Elkan [2001]. Histogram binning has displayed strong practical performance, but theoretical guarantees have only been shown for sample split versions that avoid 'double dipping' the data. We demonstrate that the statistical cost of sample splitting is practically significant on a credit default dataset. We then prove calibration guarantees for the original method that double dips the data, using a certain Markov property of order statistics. Based on our results, we make practical recommendations for choosing the number of bins in histogram binning. In our illustrative simulations, we propose a new tool for assessing calibration -- validity plots -- which provide more information than an ECE estimate. Code for this work will be made publicly available at https://github.com/aigen/df-posthoc-calibration.
MEOct 23, 2019Code
Nested conformal prediction and quantile out-of-bag ensemble methodsChirag Gupta, Arun K. Kuchibhotla, Aaditya K. Ramdas
Conformal prediction is a popular tool for providing valid prediction sets for classification and regression problems, without relying on any distributional assumptions on the data. While the traditional description of conformal prediction starts with a nonconformity score, we provide an alternate (but equivalent) view that starts with a sequence of nested sets and calibrates them to find a valid prediction set. The nested framework subsumes all nonconformity scores, including recent proposals based on quantile regression and density estimation. While these ideas were originally derived based on sample splitting, our framework seamlessly extends them to other aggregation schemes like cross-conformal, jackknife+ and out-of-bag methods. We use the framework to derive a new algorithm (QOOB, pronounced cube) that combines four ideas: quantile regression, cross-conformalization, ensemble methods and out-of-bag predictions. We develop a computationally efficient implementation of cross-conformal, that is also used by QOOB. In a detailed numerical investigation, QOOB performs either the best or close to the best on all simulated and real datasets. Code for QOOB is available at https://github.com/aigen/QOOB.
CLMay 27, 2025
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause FrequenciesTerrance Liu, Shuyi Wang, Daniel Preotiuc-Pietro et al.
While large language models (LLMs) achieve strong performance on text-to-SQL parsing, they sometimes exhibit unexpected failures in which they are confidently incorrect. Building trustworthy text-to-SQL systems thus requires eliciting reliable uncertainty measures from the LLM. In this paper, we study the problem of providing a calibrated confidence score that conveys the likelihood of an output query being correct. Our work is the first to establish a benchmark for post-hoc calibration of LLM-based text-to-SQL parsing. In particular, we show that Platt scaling, a canonical method for calibration, provides substantial improvements over directly using raw model output probabilities as confidence scores. Furthermore, we propose a method for text-to-SQL calibration that leverages the structured nature of SQL queries to provide more granular signals of correctness, named "sub-clause frequency" (SCF) scores. Using multivariate Platt scaling (MPS), our extension of the canonical Platt scaling technique, we combine individual SCF scores into an overall accurate and calibrated score. Empirical evaluation on two popular text-to-SQL datasets shows that our approach of combining MPS and SCF yields further improvements in calibration and the related task of error detection over traditional Platt scaling.
HCAug 23, 2025
Does Calibration Affect Human Actions?Meir Nizri, Amos Azaria, Chirag Gupta et al.
Calibration has been proposed as a way to enhance the reliability and adoption of machine learning classifiers. We study a particular aspect of this proposal: how does calibrating a classification model affect the decisions made by non-expert humans consuming the model's predictions? We perform a Human-Computer-Interaction (HCI) experiment to ascertain the effect of calibration on (i) trust in the model, and (ii) the correlation between decisions and predictions. We also propose further corrections to the reported calibrated scores based on Kahneman and Tversky's prospect theory from behavioral economics, and study the effect of these corrections on trust and decision-making. We find that calibration is not sufficient on its own; the prospect theory correction is crucial for increasing the correlation between human decisions and the model's predictions. While this increased correlation suggests higher trust in the model, responses to ``Do you trust the model more?" are unaffected by the method used.
LGOct 23, 2025
Learning from Interval TargetsRattana Pukdee, Ziqi Ke, Chirag Gupta
We study the problem of regression with interval targets, where only upper and lower bounds on target values are available in the form of intervals. This problem arises when the exact target label is expensive or impossible to obtain, due to inherent uncertainties. In the absence of exact targets, traditional regression loss functions cannot be used. First, we study the methodology of using a loss functions compatible with interval targets, for which we establish non-asymptotic generalization bounds based on smoothness of the hypothesis class that significantly relaxing prior assumptions of realizability and small ambiguity degree. Second, we propose a novel min-max learning formulation: minimize against the worst-case (maximized) target labels within the provided intervals. The maximization problem in the latter is non-convex, but we show that good performance can be achieved with the incorporation of smoothness constraints. Finally, we perform extensive experiments on real-world datasets and show that our methods achieve state-of-the-art performance.
LGMay 29, 2023
Parity CalibrationYoungseog Chung, Aaron Rumack, Chirag Gupta
In a sequential regression setting, a decision-maker may be primarily concerned with whether the future observation will increase or decrease compared to the current one, rather than the actual value of the future observation. In this context, we introduce the notion of parity calibration, which captures the goal of calibrated forecasting for the increase-decrease (or "parity") event in a timeseries. Parity probabilities can be extracted from a forecasted distribution for the output, but we show that such a strategy leads to theoretical unpredictability and poor practical performance. We then observe that although the original task was regression, parity calibration can be expressed as binary calibration. Drawing on this connection, we use an online binary calibration method to achieve parity calibration. We demonstrate the effectiveness of our approach on real-world case studies in epidemiology, weather forecasting, and model-based control in nuclear fusion.
LGJan 24, 2021
Modern Machine and Deep Learning Systems as a way to achieve Man-Computer SymbiosisChirag Gupta
Man-Computer Symbiosis (MCS) was originally envisioned by the famous computer pioneer J.C.R. Licklider in 1960, as a logical evolution of the then inchoate relationship between computer and humans. In his paper, Licklider provided a set of criteria by which to judge if a Man-Computer System is a symbiotic one, and also provided some predictions about such systems in the near and far future. Since then, innovations in computer networks and the invention of the Internet were major developments towards that end. However, with most systems based on conventional logical algorithms, many aspects of Licklider's MCS remained unfulfilled. This paper explores the extent to which modern machine learning systems in general, and deep learning ones in particular best exemplify MCS systems, and why they are the prime contenders to achieve a true Man-Computer Symbiosis as described by Licklider in his original paper in the future. The case for deep learning is built by illustrating each point of the original criteria as well as the criteria laid by subsequent research into MCS systems, with specific examples and applications provided to strengthen the arguments. The efficacy of deep neural networks in achieving Artificial General Intelligence, which would be the perfect version of an MCS system is also explored.
MLJun 18, 2020
Distribution-free binary classification: prediction sets, confidence intervals and calibrationChirag Gupta, Aleksandr Podkopaev, Aaditya Ramdas
We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting, that is without making any distributional assumptions on the data. With a focus towards calibration, we establish a 'tripod' of theorems that connect these three notions for score-based classifiers. A direct implication is that distribution-free calibration is only possible, even asymptotically, using a scoring function whose level sets partition the feature space into at most countably many sets. Parametric calibration schemes such as variants of Platt scaling do not satisfy this requirement, while nonparametric schemes based on binning do. To close the loop, we derive distribution-free confidence intervals for binned probabilities for both fixed-width and uniform-mass binning. As a consequence of our 'tripod' theorems, these confidence intervals for binned probabilities lead to distribution-free calibration. We also derive extensions to settings with streaming data and covariate shift.
LGJan 9, 2020
Shallow Encoder Deep Decoder (SEDD) Networks for Image Encryption and DecryptionChirag Gupta
This paper explores a new framework for lossy image encryption and decryption using a simple shallow encoder neural network E for encryption, and a complex deep decoder neural network D for decryption. E is kept simple so that encoding can be done on low power and portable devices and can in principle be any nonlinear function which outputs an encoded vector. D is trained to decode the encodings using the dataset of image - encoded vector pairs obtained from E and happens independently of E. As the encodings come from E which while being a simple neural network, still has thousands of random parameters and therefore the encodings would be practically impossible to crack without D. This approach differs from autoencoders as D is trained completely independently of E, although the structure may seem similar. Therefore, this paper also explores empirically if a deep neural network can learn to reconstruct the original data in any useful form given the output of a neural network or any other nonlinear function, which can have very useful applications in Cryptanalysis. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the decoded images from D along with some limitations.
ROJan 6, 2020
Self learning robot using real-time neural networksChirag Gupta, Chikita Nangia, Chetan Kumar
With the advancements in high volume, low precision computational technology and applied research on cognitive artificially intelligent heuristic systems, machine learning solutions through neural networks with real-time learning has seen an immense interest in the research community as well the industry. This paper involves research, development and experimental analysis of a neural network implemented on a robot with an arm through which evolves to learn to walk in a straight line or as required. The neural network learns using the algorithms of Gradient Descent and Backpropagation. Both the implementation and training of the neural network is done locally on the robot on a raspberry pi 3 so that its learning process is completely independent. The neural network is first tested on a custom simulator developed on MATLAB and then implemented on the raspberry computer. Data at each generation of the evolving network is stored, and analysis both mathematical and graphical is done on the data. Impact of factors like the learning rate and error tolerance on the learning process and final output is analyzed.
LGAug 2, 2019
Path Length Bounds for Gradient Descent and FlowChirag Gupta, Sivaraman Balakrishnan, Aaditya Ramdas
We derive bounds on the path length $ζ$ of gradient descent (GD) and gradient flow (GF) curves for various classes of smooth convex and nonconvex functions. Among other results, we prove that: (a) if the iterates are linearly convergent with factor $(1-c)$, then $ζ$ is at most $\mathcal{O}(1/c)$; (b) under the Polyak-Kurdyka-Lojasiewicz (PKL) condition, $ζ$ is at most $\mathcal{O}(\sqrtκ)$, where $κ$ is the condition number, and at least $\widetildeΩ(\sqrt{d} \wedge κ^{1/4})$; (c) for quadratics, $ζ$ is $Θ(\min\{\sqrt{d},\sqrt{\log κ}\})$ and in some cases can be independent of $κ$; (d) assuming just convexity, $ζ$ can be at most $2^{4d\log d}$; (e) for separable quasiconvex functions, $ζ$ is $Θ(\sqrt{d})$. Thus, we advance current understanding of the properties of GD and GF curves beyond rates of convergence. We expect our techniques to facilitate future studies for other algorithms.