Bhaswar B. Bhattacharya

LG
3papers
58citations
Novelty50%
AI Score23

3 Papers

STAug 29, 2020
Estimation in Tensor Ising Models

Somabha Mukherjee, Jaesung Son, Bhaswar B. Bhattacharya

The $p$-tensor Ising model is a one-parameter discrete exponential family for modeling dependent binary data, where the sufficient statistic is a multi-linear form of degree $p \geq 2$. This is a natural generalization of the matrix Ising model, that provides a convenient mathematical framework for capturing higher-order dependencies in complex relational data. In this paper, we consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes. Our estimate is based on the maximum pseudo-likelihood (MPL) method, which provides a computationally efficient algorithm for estimating the parameter that avoids computing the intractable partition function. We derive general conditions under which the MPL estimate is $\sqrt N$-consistent, that is, it converges to the true parameter at rate $1/\sqrt N$. In particular, we show the $\sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model, spin systems on general $p$-uniform hypergraphs, and Ising models on the hypergraph stochastic block model (HSBM). In fact, for the HSBM we pin down the exact location of the phase transition threshold, which is determined by the positivity of a certain mean-field variational problem, such that above this threshold the MPL estimate is $\sqrt N$-consistent, while below the threshold no estimator is consistent. Finally, we derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model. An interesting consequence of our results is that the MPL estimate in the Curie-Weiss model saturates the Cramer-Rao lower bound at all points above the estimation threshold, that is, the MPL estimate incurs no loss in asymptotic efficiency, even though it is obtained by minimizing only an approximation of the true likelihood function for computational tractability.

LGApr 17, 2015
Testing Closeness With Unequal Sized Samples

Bhaswar B. Bhattacharya, Gregory Valiant

We consider the problem of closeness testing for two discrete distributions in the practically relevant setting of \emph{unequal} sized samples drawn from each of them. Specifically, given a target error parameter $\varepsilon > 0$, $m_1$ independent draws from an unknown distribution $p,$ and $m_2$ draws from an unknown distribution $q$, we describe a test for distinguishing the case that $p=q$ from the case that $||p-q||_1 \geq \varepsilon$. If $p$ and $q$ are supported on at most $n$ elements, then our test is successful with high probability provided $m_1\geq n^{2/3}/\varepsilon^{4/3}$ and $m_2 = Ω(\max\{\frac{n}{\sqrt m_1\varepsilon^2}, \frac{\sqrt n}{\varepsilon^2}\});$ we show that this tradeoff is optimal throughout this range, to constant factors. These results extend the recent work of Chan et al. who established the sample complexity when the two samples have equal sizes, and tightens the results of Acharya et al. by polynomials factors in both $n$ and $\varepsilon$. As a consequence, we obtain an algorithm for estimating the mixing time of a Markov chain on $n$ states up to a $\log n$ factor that uses $\tilde{O}(n^{3/2} τ_{mix})$ queries to a "next node" oracle, improving upon the $\tilde{O}(n^{5/3}τ_{mix})$ query algorithm of Batu et al. Finally, we note that the core of our testing algorithm is a relatively simple statistic that seems to perform well in practice, both on synthetic data and on natural language data.

PRNov 18, 2013
Collision Times in Multicolor Urn Models and Sequential Graph Coloring With Applications to Discrete Logarithms

Bhaswar B. Bhattacharya

Consider an urn model where at each step one of $q$ colors is sampled according to some probability distribution and a ball of that color is placed in an urn. The distribution of assigning balls to urns may depend on the color of the ball. Collisions occur when a ball is placed in an urn which already contains a ball of different color. Equivalently, this can be viewed as sequentially coloring a complete $q$-partite graph wherein a collision corresponds to the appearance of a monochromatic edge. Using a Poisson embedding technique, the limiting distribution of the first collision time is determined and the possible limits are explicitly described. Joint distribution of successive collision times and multi-fold collision times are also derived. The results can be used to obtain the limiting distributions of running times in various birthday problem based algorithms for solving the discrete logarithm problem, generalizing previous results which only consider expected running times. Asymptotic distributions of the time of appearance of a monochromatic edge are also obtained for other graphs.