Geoffrey Wolfer

ST
8papers
133citations
Novelty48%
AI Score39

8 Papers

STOct 13, 2022
Variance-Aware Estimation of Kernel Mean Embedding

Geoffrey Wolfer, Pierre Alquier

An important feature of kernel mean embeddings (KME) is that the rate of convergence of the empirical KME to the true distribution KME can be bounded independently of the dimension of the space, properties of the distribution and smoothness features of the kernel. We show how to speed-up convergence by leveraging variance information in the reproducing kernel Hilbert space. Furthermore, we show that even when such information is a priori unknown, we can efficiently estimate it from the data, recovering the desiderata of a distribution agnostic bound that enjoys acceleration in fortuitous settings. We further extend our results from independent data to stationary mixing sequences and illustrate our methods in the context of hypothesis testing and robust parametric estimation.

8.4PRMar 18
Geometry and factorization of multivariate Markov chains with applications to MCMC acceleration and approximate inference

Michael C. H. Choi, Youjia Wang, Geoffrey Wolfer

This paper analyzes the factorizability and geometry of transition matrices of multivariate Markov chains. Specifically, we demonstrate that the induced chains on factors of a product space can be regarded as information projections with respect to the Kullback-Leibler divergence. This perspective yields Han-Shearer type inequalities and submodularity of the entropy rate of Markov chains, as well as applications in the context of large deviations and mixing time comparison. As concrete algorithmic applications in Markov chain Monte Carlo (MCMC) and approximate inference, we provide three illustrations based on lifted MCMC, swapping algorithm and factored filtering to demonstrate projection samplers improve mixing over the original samplers. The projection sampler based on the swapping algorithm resamples the highest-temperature coordinate at stationarity at each step, and we prove that such practice accelerates the mixing time by multiplicative factors related to the number of temperatures and the dimension of the underlying state space when compared with the original swapping algorithm. Through simple numerical experiments on a bimodal target distribution, we show that the projection samplers mix effectively, in contrast to lifted MCMC and the swapping algorithm, which mix less well. In filtering, our proposed factored filtering scheme is able to scale to high dimensions with linear-in-dimension computational cost per step at the price of an approximation error that can be tracked using the distance to independence, compared with the exponential-in-dimension cost per step of the exact filter.

MLMay 20, 2021
On the $α$-lazy version of Markov chains in estimation and testing problems

Sela Fried, Geoffrey Wolfer

Given access to a single long trajectory generated by an unknown irreducible Markov chain $M$, we simulate an $α$-lazy version of $M$ which is ergodic. This enables us to generalize recent results on estimation and identity testing that were stated for ergodic Markov chains in a way that allows fully empirical inference. In particular, our approach shows that the pseudo spectral gap introduced by Paulin [2015] and defined for ergodic Markov chains may be given a meaning already in the case of irreducible but possibly periodic Markov chains.

STMay 13, 2021
Identity testing of reversible Markov chains

Sela Fried, Geoffrey Wolfer

We consider the problem of identity testing of Markov chain transition matrices based on a single trajectory of observations under the distance notion introduced by Daskalakis et al. [2018a] and further analyzed by Cherapanamjeri and Bartlett [2019]. Both works made the restrictive assumption that the Markov chains under consideration are symmetric. In this work we relax the symmetry assumption and show that it is possible to perform identity testing under the much weaker assumption of reversibility, provided that the stationary distributions of the reference and of the unknown Markov chains are close under a distance notion related to the separation distance. Additionally, we provide intuition on the distance notion of Daskalakis et al. [2018a] by showing how it behaves under several natural operations. In particular, we address some of their open questions.

PRDec 14, 2019
Empirical and Instance-Dependent Estimation of Markov Chain and Mixing Time

Geoffrey Wolfer

We address the problem of estimating the mixing time of a Markov chain from a single trajectory of observations. Unlike most previous works which employed Hilbert space methods to estimate spectral gaps, we opt for an approach based on contraction with respect to total variation. Specifically, we estimate the contraction coefficient introduced in Wolfer [2020], inspired from Dobrushin's. This quantity, unlike the spectral gap, controls the mixing time up to strong universal constants and remains applicable to non-reversible chains. We improve existing fully data-dependent confidence intervals around this contraction coefficient, which are both easier to compute and thinner than spectral counterparts. Furthermore, we introduce a novel analysis beyond the worst-case scenario by leveraging additional information about the transition matrix. This allows us to derive instance-dependent rates for estimating the matrix with respect to the induced uniform norm, and some of its mixing properties.

STFeb 1, 2019
Estimating the Mixing Time of Ergodic Markov Chains

Geoffrey Wolfer, Aryeh Kontorovich

We address the problem of estimating the mixing time $t_{\mathsf{mix}}$ of an arbitrary ergodic finite-state Markov chain from a single trajectory of length $m$. The reversible case was addressed by Hsu et al. [2019], who left the general case as an open problem. In the reversible case, the analysis is greatly facilitated by the fact that the Markov operator is self-adjoint, and Weyl's inequality allows for a dimension-free perturbation analysis of the empirical eigenvalues. As Hsu et al. point out, in the absence of reversibility (which induces asymmetric pair probabilities matrices), the existing perturbation analysis has a worst-case exponential dependence on the number of states $d$. Furthermore, even if an eigenvalue perturbation analysis with better dependence on $d$ were available, in the non-reversible case the connection between the spectral gap and the mixing time is not nearly as straightforward as in the reversible case. Our key insight is to estimate the pseudo-spectral gap $γ_{\mathsf{ps}}$ instead, which allows us to overcome the loss of symmetry and to achieve a polynomial dependence on the minimal stationary probability $π_\star$ and $γ_{\mathsf{ps}}$. Additionally, in the reversible case, we obtain simultaneous nearly (up to logarithmic factors) minimax rates in $t_{\mathsf{mix}}$ and precision $\varepsilon$, closing a gap in Hsu et al., who treated $\varepsilon$ as constant in the lower bounds. Finally, we construct fully empirical confidence intervals for $γ_{\mathsf{ps}}$, which shrink to zero at a rate of roughly $1/\sqrt{m}$, and improve the state of the art in even the reversible case.

MLJan 31, 2019
Minimax Testing of Identity to a Reference Ergodic Markov Chain

Geoffrey Wolfer, Aryeh Kontorovich

We exhibit an efficient procedure for testing, based on a single long state sequence, whether an unknown Markov chain is identical to or $\varepsilon$-far from a given reference chain. We obtain nearly matching (up to logarithmic factors) upper and lower sample complexity bounds for our notion of distance, which is based on total variation. Perhaps surprisingly, we discover that the sample complexity depends solely on the properties of the known reference chain and does not involve the unknown chain at all, which is not even assumed to be ergodic.

MLSep 13, 2018
Statistical Estimation of Ergodic Markov Chain Kernel over Discrete State Space

Geoffrey Wolfer, Aryeh Kontorovich

We investigate the statistical complexity of estimating the parameters of a discrete-state Markov chain kernel from a single long sequence of state observations. In the finite case, we characterize (modulo logarithmic factors) the minimax sample complexity of estimation with respect to the operator infinity norm, while in the countably infinite case, we analyze the problem with respect to a natural entry-wise norm derived from total variation. We show that in both cases, the sample complexity is governed by the mixing properties of the unknown chain, for which, in the finite-state case, there are known finite-sample estimators with fully empirical confidence intervals.