LGFeb 10, 2023
On the Interventional Kullback-Leibler DivergenceJonas Wildberger, Siyuan Guo, Arnab Bhattacharyya et al.
Modern machine learning approaches excel in static settings where a large amount of i.i.d. training data are available for a given task. In a dynamic environment, though, an intelligent agent needs to be able to transfer knowledge and re-use learned components across domains. It has been argued that this may be possible through causal models, aiming to mirror the modularity of the real world in terms of independent causal mechanisms. However, the true causal structure underlying a given set of data is generally not identifiable, so it is desirable to have means to quantify differences between models (e.g., between the ground truth and an estimate), on both the observational and interventional level. In the present work, we introduce the Interventional Kullback-Leibler (IKL) divergence to quantify both structural and distributional differences between models based on a finite set of multi-environment distributions generated by interventions from the ground truth. Since we generally cannot quantify all differences between causal models for every finite set of interventional distributions, we propose a sufficient condition on the intervention targets to identify subsets of observed variables on which the models provably agree or disagree.
LGJan 1, 2023
An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal EffectsThanh Vinh Vo, Arnab Bhattacharyya, Young Lee et al.
We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have different distributions; the causal effects are independently and systematically incorporated. The proposed method estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. The heterogeneous causal effects can be estimated with no sharing of the raw training data among the sources, thus minimizing the risk of privacy leak. We also provide minimax lower bounds to assess the quality of the parameters learned from the disparate sources. The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions.
LGJun 30, 2022
Verification and search algorithms for causal DAGsDavin Choo, Kirankumar Shiragur, Arnab Bhattacharyya
We study two problems related to recovering causal graphs from interventional data: (i) $\textit{verification}$, where the task is to check if a purported causal graph is correct, and (ii) $\textit{search}$, where the task is to recover the correct causal graph. For both, we wish to minimize the number of interventions performed. For the first problem, we give a characterization of a minimal sized set of atomic interventions that is necessary and sufficient to check the correctness of a claimed causal graph. Our characterization uses the notion of $\textit{covered edges}$, which enables us to obtain simple proofs and also easily reason about earlier known results. We also generalize our results to the settings of bounded size interventions and node-dependent interventional costs. For all the above settings, we provide the first known provable algorithms for efficiently computing (near)-optimal verifying sets on general graphs. For the second problem, we give a simple adaptive algorithm based on graph separators that produces an atomic intervention set which fully orients any essential graph while using $\mathcal{O}(\log n)$ times the optimal number of interventions needed to $\textit{verify}$ (verifying size) the underlying DAG on $n$ vertices. This approximation is tight as $\textit{any}$ search algorithm on an essential line graph has worst case approximation ratio of $Ω(\log n)$ with respect to the verifying size. With bounded size interventions, each of size $\leq k$, our algorithm gives an $\mathcal{O}(\log n \cdot \log k)$ factor approximation. Our result is the first known algorithm that gives a non-trivial approximation guarantee to the verifying size on general unweighted graphs and with bounded size interventions.
DSApr 19, 2022
Independence Testing for Bounded Degree Bayesian NetworkArnab Bhattacharyya, Clément L. Canonne, Joy Qiping Yang
We study the following independence testing problem: given access to samples from a distribution $P$ over $\{0,1\}^n$, decide whether $P$ is a product distribution or whether it is $\varepsilon$-far in total variation distance from any product distribution. For arbitrary distributions, this problem requires $\exp(n)$ samples. We show in this work that if $P$ has a sparse structure, then in fact only linearly many samples are required. Specifically, if $P$ is Markov with respect to a Bayesian network whose underlying DAG has in-degree bounded by $d$, then $\tildeΘ(2^{d/2}\cdot n/\varepsilon^2)$ samples are necessary and sufficient for independence testing.
DSSep 17, 2023
Total Variation Distance Meets Probabilistic InferenceArnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel et al.
In this paper, we establish a novel connection between total variation (TV) distance estimation and probabilistic inference. In particular, we present an efficient, structure-preserving reduction from relative approximation of TV distance to probabilistic inference over directed graphical models. This reduction leads to a fully polynomial randomized approximation scheme (FPRAS) for estimating TV distances between same-structure distributions over any class of Bayes nets for which there is an efficient probabilistic inference algorithm. In particular, it leads to an FPRAS for estimating TV distances between distributions that are defined over a common Bayes net of small treewidth. Prior to this work, such approximation schemes only existed for estimating TV distances between product distributions. Our approach employs a new notion of $partial$ couplings of high-dimensional distributions, which might be of independent interest.
LGOct 10, 2023
Learning bounded-degree polytrees with known skeletonDavin Choo, Joy Qiping Yang, Arnab Bhattacharyya et al.
We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results by providing an efficient algorithm which learns $d$-polytrees in polynomial time and sample complexity for any bounded $d$ when the underlying undirected graph (skeleton) is known. We complement our algorithm with an information-theoretic sample complexity lower bound, showing that the dependence on the dimension and target accuracy parameters are nearly tight.
LGApr 13, 2023
Near-Optimal Degree Testing for Bayes NetsVipul Arora, Arnab Bhattacharyya, Clément L. Canonne et al.
This paper considers the problem of testing the maximum in-degree of the Bayes net underlying an unknown probability distribution $P$ over $\{0,1\}^n$, given sample access to $P$. We show that the sample complexity of the problem is $\tildeΘ(2^{n/2}/\varepsilon^2)$. Our algorithm relies on a testing-by-learning framework, previously used to obtain sample-optimal testers; in order to apply this framework, we develop new algorithms for ``near-proper'' learning of Bayes nets, and high-probability learning under $χ^2$ divergence, which are of independent interest.
LGNov 13, 2025
Product distribution learning with imperfect adviceArnab Bhattacharyya, Davin Choo, Philips George John et al.
Given i.i.d.~samples from an unknown distribution $P$, the goal of distribution learning is to recover the parameters of a distribution that is close to $P$. When $P$ belongs to the class of product distributions on the Boolean hypercube $\{0,1\}^d$, it is known that $Ω(d/\varepsilon^2)$ samples are necessary to learn $P$ within total variation (TV) distance $\varepsilon$. We revisit this problem when the learner is also given as advice the parameters of a product distribution $Q$. We show that there is an efficient algorithm to learn $P$ within TV distance $\varepsilon$ that has sample complexity $\tilde{O}(d^{1-η}/\varepsilon^2)$, if $\|\mathbf{p} - \mathbf{q}\|_1 < \varepsilon d^{0.5 - Ω(η)}$. Here, $\mathbf{p}$ and $\mathbf{q}$ are the mean vectors of $P$ and $Q$ respectively, and no bound on $\|\mathbf{p} - \mathbf{q}\|_1$ is known to the algorithm a priori.
LGJul 1, 2024
Learnability of Parameter-Bounded Bayes NetsArnab Bhattacharyya, Davin Choo, Sutanu Gayen et al.
Bayes nets are extensively used in practice to efficiently represent joint probability distributions over a set of random variables and capture dependency relations. In a seminal paper, Chickering et al. (JMLR 2004) showed that given a distribution $\mathbb{P}$, that is defined as the marginal distribution of a Bayes net, it is $\mathsf{NP}$-hard to decide whether there is a parameter-bounded Bayes net that represents $\mathbb{P}$. They called this problem LEARN. In this work, we extend the $\mathsf{NP}$-hardness result of LEARN and prove the $\mathsf{NP}$-hardness of a promise search variant of LEARN, whereby the Bayes net in question is guaranteed to exist and one is asked to find such a Bayes net. We complement our hardness result with a positive result about the sample complexity that is sufficient to recover a parameter-bounded Bayes net that is close (in TV distance) to a given distribution $\mathbb{P}$, that is represented by some parameter-bounded Bayes net, generalizing a degree-bounded sample complexity result of Brustle et al. (EC 2020).
3.2DSMar 30
Testing Sparse Functions over the RealsVipul Arora, Arnab Bhattacharyya, Philips George John et al.
Over the last three decades, function testing has been extensively studied over Boolean, finite fields, and discrete settings. However, to encode the real-world applications more succinctly, function testing over the reals (where the domain and range, both are reals) is of prime importance. Recently, there have been some works in the direction of testing for algebraic representations of such functions: the work by Fleming and Yoshida (ITCS 20), Arora, Kelman, and Meir (SOSA 25) on linearity testing and the work of Arora, Bhattacharyya, Fleming, Kelman, and Yoshida (SODA 23) for testing low-degree polynomials. Our work follows the same avenue, wherein we study three well-studied sparse representations of functions, over the reals, namely (i) $k$-linearity, (ii) $k$-sparse polynomials, and (iii) $k$-junta. In this setting, given approximate query access to some $f:\mathbb{R}^n \rightarrow \mathbb{R}$, we want to decide if the function satisfies some property of interest, or if it is far from all functions that satisfy the property. Here, the distance is measured in the $\ell_1$-metric, under the assumption that we are drawing samples from the Standard Gaussian distribution. We present efficient testers and $Ω(k)$ lower bounds for testing each of these three properties.
LGMay 16, 2024
Online bipartite matching with imperfect adviceDavin Choo, Themis Gouleakis, Chun Kai Ling et al.
We study the problem of online unweighted bipartite matching with $n$ offline vertices and $n$ online vertices where one wishes to be competitive against the optimal offline algorithm. While the classic RANKING algorithm of Karp et al. [1990] provably attains competitive ratio of $1-1/e > 1/2$, we show that no learning-augmented method can be both 1-consistent and strictly better than $1/2$-robust under the adversarial arrival model. Meanwhile, under the random arrival model, we show how one can utilize methods from distribution testing to design an algorithm that takes in external advice about the online vertices and provably achieves competitive ratio interpolating between any ratio attainable by advice-free methods and the optimal ratio of 1, depending on the advice quality.
LGNov 19, 2024
Learning multivariate Gaussians with imperfect adviceArnab Bhattacharyya, Davin Choo, Philips George John et al.
We revisit the problem of distribution learning within the framework of learning-augmented algorithms. In this setting, we explore the scenario where a probability distribution is provided as potentially inaccurate advice on the true, unknown distribution. Our objective is to develop learning algorithms whose sample complexity decreases as the quality of the advice improves, thereby surpassing standard learning lower bounds when the advice is sufficiently accurate. Specifically, we demonstrate that this outcome is achievable for the problem of learning a multivariate Gaussian distribution $N(\boldsymbolμ, \boldsymbolΣ)$ in the PAC learning setting. Classically, in the advice-free setting, $\tildeΘ(d^2/\varepsilon^2)$ samples are sufficient and worst case necessary to learn $d$-dimensional Gaussians up to TV distance $\varepsilon$ with constant probability. When we are additionally given a parameter $\tilde{\boldsymbolΣ}$ as advice, we show that $\tilde{O}(d^{2-β}/\varepsilon^2)$ samples suffices whenever $\| \tilde{\boldsymbolΣ}^{-1/2} \boldsymbolΣ \tilde{\boldsymbolΣ}^{-1/2} - \boldsymbol{I_d} \|_1 \leq \varepsilon d^{1-β}$ (where $\|\cdot\|_1$ denotes the entrywise $\ell_1$ norm) for any $β> 0$, yielding a polynomial improvement over the advice-free setting.
LGFeb 9, 2024
Optimal estimation of Gaussian (poly)treesYuhao Wang, Ming Gao, Wai Ming Tai et al.
We develop optimal algorithms for learning undirected Gaussian trees and directed Gaussian polytrees from data. We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery). The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently. The second approach is a modification of the PC algorithm for polytrees that uses partial correlation as a conditional independence tester for constraint-based structure learning. We derive explicit finite-sample guarantees for both approaches, and show that both approaches are optimal by deriving matching lower bounds. Additionally, we conduct numerical experiments to compare the performance of various algorithms, providing further insights and empirical evidence.
LGApr 28, 2025
Learning High-dimensional Gaussians from Censored DataArnab Bhattacharyya, Constantinos Daskalakis, Themis Gouleakis et al.
We provide efficient algorithms for the problem of distribution learning from high-dimensional Gaussian data where in each sample, some of the variable values are missing. We suppose that the variables are missing not at random (MNAR). The missingness model, denoted by $S(y)$, is the function that maps any point $y$ in $R^d$ to the subsets of its coordinates that are seen. In this work, we assume that it is known. We study the following two settings: (i) Self-censoring: An observation $x$ is generated by first sampling the true value $y$ from a $d$-dimensional Gaussian $N(μ*, Σ*)$ with unknown $μ*$ and $Σ*$. For each coordinate $i$, there exists a set $S_i$ subseteq $R^d$ such that $x_i = y_i$ if and only if $y_i$ in $S_i$. Otherwise, $x_i$ is missing and takes a generic value (e.g., "?"). We design an algorithm that learns $N(μ*, Σ*)$ up to total variation (TV) distance epsilon, using $poly(d, 1/ε)$ samples, assuming only that each pair of coordinates is observed with sufficiently high probability. (ii) Linear thresholding: An observation $x$ is generated by first sampling $y$ from a $d$-dimensional Gaussian $N(μ*, Σ)$ with unknown $μ*$ and known $Σ$, and then applying the missingness model $S$ where $S(y) = {i in [d] : v_i^T y <= b_i}$ for some $v_1, ..., v_d$ in $R^d$ and $b_1, ..., b_d$ in $R$. We design an efficient mean estimation algorithm, assuming that none of the possible missingness patterns is very rare conditioned on the values of the observed coordinates and that any small subset of coordinates is observed with sufficiently high probability.
DSMar 14, 2025
Approximating the Total Variation Distance between GaussiansArnab Bhattacharyya, Weiming Feng, Piyush Srivastava
The total variation distance is a metric of central importance in statistics and probability theory. However, somewhat surprisingly, questions about computing it algorithmically appear not to have been systematically studied until very recently. In this paper, we contribute to this line of work by studying this question in the important special case of multivariate Gaussians. More formally, we consider the problem of approximating the total variation distance between two multivariate Gaussians to within an $ε$-relative error. Previous works achieved a fixed constant relative error approximation via closed-form formulas. In this work, we give algorithms that given any two $n$-dimensional Gaussians $D_1,D_2$, and any error bound $ε> 0$, approximate the total variation distance $D := d_{TV}(D_1,D_2)$ to $ε$-relative accuracy in $\text{poly}(n,\frac{1}ε,\log \frac{1}{D})$ operations. The main technical tool in our work is a reduction that helps us extend the recent progress on computing the TV-distance between discrete random variables to our continuous setting.
LGNov 16, 2024
Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPsPhilips George John, Arnab Bhattacharyya, Silviu Maniu et al.
Reinforcement learning algorithms are usually stated without theoretical guarantees regarding their performance. Recently, Jin, Yang, Wang, and Jordan (COLT 2020) showed a polynomial-time reinforcement learning algorithm (namely, LSVI-UCB) for the setting of linear Markov decision processes, and provided theoretical guarantees regarding its running time and regret. In real-world scenarios, however, the space usage of this algorithm can be prohibitive due to a utilized linear regression step. We propose and analyze two modifications of LSVI-UCB, which alternate periods of learning and not-learning, to reduce space and time usage while maintaining sublinear regret. We show experimentally, on synthetic data and real-world benchmarks, that our algorithms achieve low space usage and running time, while not significantly sacrificing regret.
LGMay 13, 2024
Distribution Learning Meets Graph Structure SamplingArnab Bhattacharyya, Sutanu Gayen, Philips George John et al.
This work establishes a novel link between the problem of PAC-learning high-dimensional graphical models and the task of (efficient) counting and sampling of graph structures, using an online learning framework. We observe that if we apply the exponentially weighted average (EWA) or randomized weighted majority (RWM) forecasters on a sequence of samples from a distribution P using the log loss function, the average regret incurred by the forecaster's predictions can be used to bound the expected KL divergence between P and the predictions. Known regret bounds for EWA and RWM then yield new sample complexity bounds for learning Bayes nets. Moreover, these algorithms can be made computationally efficient for several interesting classes of Bayes nets. Specifically, we give a new sample-optimal and polynomial time learning algorithm with respect to trees of unknown structure and the first polynomial sample and time algorithm for learning with respect to Bayes nets over a given chordal skeleton.
DSMar 14, 2024
Outlier Robust Multivariate Polynomial RegressionVipul Arora, Arnab Bhattacharyya, Mathews Boban et al.
We study the problem of robust multivariate polynomial regression: let $p\colon\mathbb{R}^n\to\mathbb{R}$ be an unknown $n$-variate polynomial of degree at most $d$ in each variable. We are given as input a set of random samples $(\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R}$ that are noisy versions of $(\mathbf{x}_i,p(\mathbf{x}_i))$. More precisely, each $\mathbf{x}_i$ is sampled independently from some distribution $χ$ on $[-1,1]^n$, and for each $i$ independently, $y_i$ is arbitrary (i.e., an outlier) with probability at most $ρ< 1/2$, and otherwise satisfies $|y_i-p(\mathbf{x}_i)|\leqσ$. The goal is to output a polynomial $\hat{p}$, of degree at most $d$ in each variable, within an $\ell_\infty$-distance of at most $O(σ)$ from $p$. Kane, Karmalkar, and Price [FOCS'17] solved this problem for $n=1$. We generalize their results to the $n$-variate setting, showing an algorithm that achieves a sample complexity of $O_n(d^n\log d)$, where the hidden constant depends on $n$, if $χ$ is the $n$-dimensional Chebyshev distribution. The sample complexity is $O_n(d^{2n}\log d)$, if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most $O(σ)$, and the run-time depends on $\log(1/σ)$. In the setting where each $\mathbf{x}_i$ and $y_i$ are known up to $N$ bits of precision, the run-time's dependence on $N$ is linear. We also show that our sample complexities are optimal in terms of $d^n$. Furthermore, we show that it is possible to have the run-time be independent of $1/σ$, at the cost of a higher sample complexity.
LGMay 31, 2023
Active causal structure learning with adviceDavin Choo, Themis Gouleakis, Arnab Bhattacharyya
We introduce the problem of active causal structure learning with advice. In the typical well-studied setting, the learning algorithm is given the essential graph for the observational distribution and is asked to recover the underlying causal directed acyclic graph (DAG) $G^*$ while minimizing the number of interventions made. In our setting, we are additionally given side information about $G^*$ as advice, e.g. a DAG $G$ purported to be $G^*$. We ask whether the learning algorithm can benefit from the advice when it is close to being correct, while still having worst-case guarantees even when the advice is arbitrarily bad. Our work is in the same space as the growing body of research on algorithms with predictions. When the advice is a DAG $G$, we design an adaptive search algorithm to recover $G^*$ whose intervention cost is at most $O(\max\{1, \log ψ\})$ times the cost for verifying $G^*$; here, $ψ$ is a distance measure between $G$ and $G^*$ that is upper bounded by the number of variables $n$, and is exactly 0 when $G=G^*$. Our approximation factor matches the state-of-the-art for the advice-less setting.
DSJul 25, 2021
Efficient inference of interventional distributionsArnab Bhattacharyya, Sutanu Gayen, Saravanan Kandasamy et al.
We consider the problem of efficiently inferring interventional distributions in a causal Bayesian network from a finite number of observations. Let $\mathcal{P}$ be a causal model on a set $\mathbf{V}$ of observable variables on a given causal graph $G$. For sets $\mathbf{X},\mathbf{Y}\subseteq \mathbf{V}$, and setting ${\bf x}$ to $\mathbf{X}$, let $P_{\bf x}(\mathbf{Y})$ denote the interventional distribution on $\mathbf{Y}$ with respect to an intervention ${\bf x}$ to variables ${\bf x}$. Shpitser and Pearl (AAAI 2006), building on the work of Tian and Pearl (AAAI 2001), gave an exact characterization of the class of causal graphs for which the interventional distribution $P_{\bf x}({\mathbf{Y}})$ can be uniquely determined. We give the first efficient version of the Shpitser-Pearl algorithm. In particular, under natural assumptions, we give a polynomial-time algorithm that on input a causal graph $G$ on observable variables $\mathbf{V}$, a setting ${\bf x}$ of a set $\mathbf{X} \subseteq \mathbf{V}$ of bounded size, outputs succinct descriptions of both an evaluator and a generator for a distribution $\hat{P}$ that is $\varepsilon$-close (in total variation distance) to $P_{\bf x}({\mathbf{Y}})$ where $Y=\mathbf{V}\setminus \mathbf{X}$, if $P_{\bf x}(\mathbf{Y})$ is identifiable. We also show that when $\mathbf{Y}$ is an arbitrary set, there is no efficient algorithm that outputs an evaluator of a distribution that is $\varepsilon$-close to $P_{\bf x}({\mathbf{Y}})$ unless all problems that have statistical zero-knowledge proofs, including the Graph Isomorphism problem, have efficient randomized algorithms.
DSJul 22, 2021
Learning Sparse Fixed-Structure Gaussian Bayesian NetworksArnab Bhattacharyya, Davin Choo, Rishikesh Gajjala et al.
Gaussian Bayesian networks (a.k.a. linear Gaussian structural equation models) are widely used to model causal interactions among continuous variables. In this work, we study the problem of learning a fixed-structure Gaussian Bayesian network up to a bounded error in total variation distance. We analyze the commonly used node-wise least squares regression (LeastSquares) and prove that it has a near-optimal sample complexity. We also study a couple of new algorithms for the problem: - BatchAvgLeastSquares takes the average of several batches of least squares solutions at each node, so that one can interpolate between the batch size and the number of batches. We show that BatchAvgLeastSquares also has near-optimal sample complexity. - CauchyEst takes the median of solutions to several batches of linear systems at each node. We show that the algorithm specialized to polytrees, CauchyEstTree, has near-optimal sample complexity. Experimentally, we show that for uncontaminated, realizable data, the LeastSquares algorithm performs best, but in the presence of contamination or DAG misspecification, CauchyEst/CauchyEstTree and BatchAvgLeastSquares respectively perform better.
DSJun 17, 2021
Identifiability of AMP chain graph modelsYuhao Wang, Arnab Bhattacharyya
We study identifiability of Andersson-Madigan-Perlman (AMP) chain graph models, which are a common generalization of linear structural equation models and Gaussian graphical models. AMP models are described by DAGs on chain components which themselves are undirected graphs. For a known chain component decomposition, we show that the DAG on the chain components is identifiable if the determinants of the residual covariance matrices of the chain components are monotone non-decreasing in topological order. This condition extends the equal variance identifiability criterion for Bayes nets, and it can be generalized from determinants to any super-additive function on positive semidefinite matrices. When the component decomposition is unknown, we describe conditions that allow recovery of the full structure using a polynomial time algorithm based on submodular function minimization. We also conduct experiments comparing our algorithm's performance against existing baselines.
DSDec 29, 2020
Testing Product Distributions: A Closer LookArnab Bhattacharyya, Sutanu Gayen, Saravanan Kandasamy et al.
We study the problems of identity and closeness testing of $n$-dimensional product distributions. Prior works by Canonne, Diakonikolas, Kane and Stewart (COLT 2017) and Daskalakis and Pan (COLT 2017) have established tight sample complexity bounds for non-tolerant testing over a binary alphabet: given two product distributions $P$ and $Q$ over a binary alphabet, distinguish between the cases $P = Q$ and $d_{\mathrm{TV}}(P, Q) > ε$. We build on this prior work to give a more comprehensive map of the complexity of testing of product distributions by investigating tolerant testing with respect to several natural distance measures and over an arbitrary alphabet. Our study gives a fine-grained understanding of how the sample complexity of tolerant testing varies with the distance measures for product distributions. In addition, we also extend one of our upper bounds on product distributions to bounded-degree Bayes nets.
NINov 27, 2020
Eco-Routing Using Open Street MapsR K Ghosh, Vinay R, Arnab Bhattacharyya
A vehicle's fuel consumption depends on its type, the speed, the condition, and the gradients of the road on which it is moving. We developed a Routing Engine for finding an eco-route (one with low fuel consumption) between a source and a destination. Open Street Maps has data on road conditions. We used CGIAR-CSI road elevation data 16[4] to integrate the road gradients into the proposed route-finding algorithm that modifies Open Street Routing Machine (OSRM). It allowed us to dynamically predict a vehicle's velocity, considering both the conditions and the road segment's slope. Using Highway EneRgy Assessment (HERA) methodology, we calculated the fuel consumed by a vehicle given its type and velocity. We have created both web and mobile interfaces through which users can specify Geo coordinates or human-readable addresses of a source and a destination. The user interface graphically displays the route obtained from the proposed Routing Engine with a detailed travel itinerary.
DSNov 9, 2020
Near-Optimal Learning of Tree-Structured Distributions by Chow-LiuArnab Bhattacharyya, Sutanu Gayen, Eric Price et al.
We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans.~Inform.~Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution $P$ on $Σ^n$ and a tree $T$ on $n$ nodes, we say $T$ is an $\varepsilon$-approximate tree for $P$ if there is a $T$-structured distribution $Q$ such that $D(P\;||\;Q)$ is at most $\varepsilon$ more than the best possible tree-structured distribution for $P$. We show that if $P$ itself is tree-structured, then the Chow-Liu algorithm with the plug-in estimator for mutual information with $\widetilde{O}(|Σ|^3 n\varepsilon^{-1})$ i.i.d.~samples outputs an $\varepsilon$-approximate tree for $P$ with constant probability. In contrast, for a general $P$ (which may not be tree-structured), $Ω(n^2\varepsilon^{-2})$ samples are necessary to find an $\varepsilon$-approximate tree. Our upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart~(STOC, 2018): we prove that for three random variables $X,Y,Z$ each over $Σ$, testing if $I(X; Y \mid Z)$ is $0$ or $\geq \varepsilon$ is possible with $\widetilde{O}(|Σ|^3/\varepsilon)$ samples. Finally, we show that for a specific tree $T$, with $\widetilde{O} (|Σ|^2n\varepsilon^{-1})$ samples from a distribution $P$ over $Σ^n$, one can efficiently learn the closest $T$-structured distribution in KL divergence by applying the add-1 estimator at each node.
MLJun 17, 2020
Efficient Statistics for Sparse Graphical Models from Truncated SamplesArnab Bhattacharyya, Rathin Desai, Sai Ganesh Nagarajan et al.
In this paper, we study high-dimensional estimation from truncated samples. We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. (i) For Gaussian graphical models, suppose $d$-dimensional samples ${\bf x}$ are generated from a Gaussian $N(μ,Σ)$ and observed only if they belong to a subset $S \subseteq \mathbb{R}^d$. We show that $μ$ and $Σ$ can be estimated with error $ε$ in the Frobenius norm, using $\tilde{O}\left(\frac{\textrm{nz}(Σ^{-1})}{ε^2}\right)$ samples from a truncated $\mathcal{N}(μ,Σ)$ and having access to a membership oracle for $S$. The set $S$ is assumed to have non-trivial measure under the unknown distribution but is otherwise arbitrary. (ii) For sparse linear regression, suppose samples $({\bf x},y)$ are generated where $y = {\bf x}^\top{Ω^*} + \mathcal{N}(0,1)$ and $({\bf x}, y)$ is seen only if $y$ belongs to a truncation set $S \subseteq \mathbb{R}$. We consider the case that $Ω^*$ is sparse with a support set of size $k$. Our main result is to establish precise conditions on the problem dimension $d$, the support size $k$, the number of observations $n$, and properties of the samples and the truncation that are sufficient to recover the support of $Ω^*$. Specifically, we show that under some mild assumptions, only $O(k^2 \log d)$ samples are needed to estimate $Ω^*$ in the $\ell_\infty$-norm up to a bounded error. For both problems, our estimator minimizes the sum of the finite population negative log-likelihood function and an $\ell_1$-regularization term.
DSFeb 13, 2020
Efficient Distance Approximation for Structured High-Dimensional Distributions via LearningArnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel et al.
We design efficient distance approximation algorithms for several classes of structured high-dimensional distributions. Specifically, we show algorithms for the following problems: - Given sample access to two Bayesian networks $P_1$ and $P_2$ over known directed acyclic graphs $G_1$ and $G_2$ having $n$ nodes and bounded in-degree, approximate $d_{tv}(P_1,P_2)$ to within additive error $ε$ using $poly(n,ε)$ samples and time - Given sample access to two ferromagnetic Ising models $P_1$ and $P_2$ on $n$ variables with bounded width, approximate $d_{tv}(P_1, P_2)$ to within additive error $ε$ using $poly(n,ε)$ samples and time - Given sample access to two $n$-dimensional Gaussians $P_1$ and $P_2$, approximate $d_{tv}(P_1, P_2)$ to within additive error $ε$ using $poly(n,ε)$ samples and time - Given access to observations from two causal models $P$ and $Q$ on $n$ variables that are defined over known causal graphs, approximate $d_{tv}(P_a, Q_a)$ to within additive error $ε$ using $poly(n,ε)$ samples, where $P_a$ and $Q_a$ are the interventional distributions obtained by the intervention $do(A=a)$ on $P$ and $Q$ respectively for a particular variable $A$. Our results are the first efficient distance approximation algorithms for these well-studied problems. They are derived using a simple and general connection to distribution learning algorithms. The distance approximation algorithms imply new efficient algorithms for {\em tolerant} testing of closeness of the above-mentioned structured high-dimensional distributions.
LGFeb 11, 2020
Learning and Sampling of Atomic Interventions from ObservationsArnab Bhattacharyya, Sutanu Gayen, Saravanan Kandasamy et al.
We study the problem of efficiently estimating the effect of an intervention on a single variable (atomic interventions) using observational samples in a causal Bayesian network. Our goal is to give algorithms that are efficient in both time and sample complexity in a non-parametric setting. Tian and Pearl (AAAI `02) have exactly characterized the class of causal graphs for which causal effects of atomic interventions can be identified from observational data. We make their result quantitative. Suppose P is a causal model on a set $\vec{V}$ of n observable variables with respect to a given causal graph G with observable distribution $P$. Let $P_x$ denote the interventional distribution over the observables with respect to an intervention of a designated variable X with x. Assuming that $G$ has bounded in-degree, bounded c-components ($k$), and that the observational distribution is identifiable and satisfies certain strong positivity condition, we give an algorithm that takes $m=\tilde{O}(nε^{-2})$ samples from $P$ and $O(mn)$ time, and outputs with high probability a description of a distribution $\hat{P}$ such that $d_{\mathrm{TV}}(P_x, \hat{P}) \leq ε$, and: 1. [Evaluation] the description can return in $O(n)$ time the probability $\hat{P}(\vec{v})$ for any assignment $\vec{v}$ to $\vec{V}$ 2. [Generation] the description can return an iid sample from $\hat{P}$ in $O(n)$ time. We also show lower bounds for the sample complexity showing that our sample complexity has an optimal dependence on the parameters $n$ and $ε$, as well as if $k=1$ on the strong positivity parameter.
DSMay 24, 2018
Learning and Testing Causal Models with InterventionsJayadev Acharya, Arnab Bhattacharyya, Constantinos Daskalakis et al.
We consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009). Given a causal Bayesian network $\mathcal{M}$ on a graph with $n$ discrete variables and bounded in-degree and bounded `confounded components', we show that $O(\log n)$ interventions on an unknown causal Bayesian network $\mathcal{X}$ on the same graph, and $\tilde{O}(n/ε^2)$ samples per intervention, suffice to efficiently distinguish whether $\mathcal{X}=\mathcal{M}$ or whether there exists some intervention under which $\mathcal{X}$ and $\mathcal{M}$ are farther than $ε$ in total variation distance. We also obtain sample/time/intervention efficient algorithms for: (i) testing the identity of two unknown causal Bayesian networks on the same graph; and (ii) learning a causal Bayesian network on a given graph. Although our algorithms are non-adaptive, we show that adaptivity does not help in general: $Ω(\log n)$ interventions are necessary for testing the identity of two unknown causal Bayesian networks on the same graph, even adaptively. Our algorithms are enabled by a new subadditivity inequality for the squared Hellinger distance between two causal Bayesian networks.
CCAug 19, 2015
Fishing out Winners from Vote StreamsArnab Bhattacharyya, Palash Dey
We investigate the problem of winner determination from computational social choice theory in the data stream model. Specifically, we consider the task of summarizing an arbitrarily ordered stream of $n$ votes on $m$ candidates into a small space data structure so as to be able to obtain the winner determined by popular voting rules. As we show, finding the exact winner requires storing essentially all the votes. So, we focus on the problem of finding an {\em $\eps$-winner}, a candidate who could win by a change of at most $\eps$ fraction of the votes. We show non-trivial upper and lower bounds on the space complexity of $\eps$-winner determination for several voting rules, including $k$-approval, $k$-veto, scoring rules, approval, maximin, Bucklin, Copeland, and plurality with run off.
DSFeb 18, 2015
On learning k-parities with and without noiseArnab Bhattacharyya, Ameet Gadekar, Ninad Rajgopal
We first consider the problem of learning $k$-parities in the on-line mistake-bound model: given a hidden vector $x \in \{0,1\}^n$ with $|x|=k$ and a sequence of "questions" $a_1, a_2, ...\in \{0,1\}^n$, where the algorithm must reply to each question with $< a_i, x> \pmod 2$, what is the best tradeoff between the number of mistakes made by the algorithm and its time complexity? We improve the previous best result of Buhrman et al. by an $\exp(k)$ factor in the time complexity. Second, we consider the problem of learning $k$-parities in the presence of classification noise of rate $η\in (0,1/2)$. A polynomial time algorithm for this problem (when $η> 0$ and $k = ω(1)$) is a longstanding challenge in learning theory. Grigorescu et al. showed an algorithm running in time ${n \choose k/2}^{1 + 4η^2 +o(1)}$. Note that this algorithm inherently requires time ${n \choose k/2}$ even when the noise rate $η$ is polynomially small. We observe that for sufficiently small noise rate, it is possible to break the $n \choose k/2$ barrier. In particular, if for some function $f(n) = ω(1)$ and $α\in [1/2, 1)$, $k = n/f(n)$ and $η= o(f(n)^{- α}/\log n)$, then there is an algorithm for the problem with running time $poly(n)\cdot {n \choose k}^{1-α} \cdot e^{-k/4.01}$.
DSNov 10, 2014
An explicit sparse recovery scheme in the L1-normArnab Bhattacharyya, Vineet Nair
Consider the approximate sparse recovery problem: given Ax, where A is a known m-by-n dimensional matrix and x is an unknown (approximately) sparse n-dimensional vector, recover an approximation to x. The goal is to design the matrix A such that m is small and recovery is efficient. Moreover, it is often desirable for A to have other nice properties, such as explicitness, sparsity, and discreteness. In this work, we show that we can use spectral expander graphs to explicitly design binary matrices A for which the column sparsity is optimal and for which there is an efficient recovery algorithm (l1-minimization). In order to recover x that is close to δn-sparse (where δ is a constant), we design an explicit binary matrix A that has m = O(sqrt{δ} log(1/δ) * n) rows and has O(log(1/δ)) ones in each column. Previous such constructions were based on unbalanced bipartite graphs with high vertex expansion, for which we currently do not have explicit constructions. In particular, ours is the first explicit non-trivial construction of a measurement matrix A such that Ax can be computed in O(n log(1/δ)) time.