NAJun 21, 2016
Carrier frequencies, holomorphy and unwindingRonald R. Coifman, Stefan Steinerberger, Hau-tieng Wu
We prove that functions of intrinsic-mode type (a classical models for signals) behave essentially like holomorphic functions: adding a pure carrier frequency $e^{int}$ ensures that the anti-holomorphic part is much smaller than the holomorphic part $ \| P_{-}(f)\|_{L^2} \ll \|P_{+}(f)\|_{L^2}.$ This enables us to use techniques from complex analysis, in particular the \textit{unwinding series}. We study its stability and convergence properties and show that the unwinding series can stabilize and show that the unwinding series can provide a high resolution time-frequency representation, which is robust to noise.
NAMar 16, 2018
Detecting localized eigenstates of linear operatorsJianfeng Lu, Stefan Steinerberger
We describe a way of detecting the location of localized eigenvectors of a linear system $Ax = λx$ for eigenvalues $λ$ with $|λ|$ comparatively large. We define the family of functions $f_α: \left\{1.2. \dots, n\right\} \rightarrow \mathbb{R}_{}$ $$ f_α(k) = \log \left( \| A^α e_k \|_{\ell^2} \right),$$ where $α\geq 0$ is a parameter and $e_k = (0,0,\dots, 0,1,0, \dots, 0)$ is the $k-$th standard basis vector. We prove that eigenvectors associated to eigenvalues with large absolute value localize around local maxima of $f_α$: the metastable states in the power iteration method (slowing down its convergence) can be used to predict localization. We present a fast randomized algorithm and discuss different examples: a random band matrix, discretizations of the local operator $-Δ+ V$ and the nonlocal operator $(-Δ)^{3/4} + V$.
NAApr 18, 2017
Optimal Jittered Sampling for two Points in the Unit SquareFlorian Pausinger, Manas Rachh, Stefan Steinerberger
Jittered Sampling is a refinement of the classical Monte Carlo sampling method. Instead of picking $n$ points randomly from $[0,1]^2$, one partitions the unit square into $n$ regions of equal measure and then chooses a point randomly from each partition. Currently, no good rules for how to partition the space are available. In this paper, we present a solution for the special case of subdividing the unit square by a decreasing function into two regions so as to minimize the expected squared $\mathcal{L}_2-$discrepancy. The optimal partitions are given by a \textit{highly} nonlinear integral equation for which we determine an approximate solution. In particular, there is a break of symmetry and the optimal partition is not into two sets of equal measure. We hope this stimulates further interest in the construction of good partitions.
CAMay 12, 2016
Stability Estimates for Truncated Fourier and Laplace TransformsRoy R. Lederman, Stefan Steinerberger
We prove sharp stability estimates for the Truncated Laplace Transform and Truncated Fourier Transform. The argument combines an approach recently introduced by Alaifari, Pierce and the second author for the truncated Hilbert transform with classical results of Bertero, Grünbaum, Landau, Pollak and Slepian. In particular, we prove there is a universal constant $c >0$ such that for all $f \in L^2(\mathbb{R})$ with compact support in $[-1,1]$ normalized to $\|f\|_{L^2[-1,1]} = 1$ $$ \int_{-1}^{1}{|\widehat{f}(ξ)|^2dξ} \gtrsim \left(c\left\|f_x \right\|_{L^2[-1,1]} \right)^{- c\left\|f_x \right\|_{L^2[-1,1]}}$$ The inequality is sharp in the sense that there is an infinite sequence of orthonormal counterexamples if $c$ is chosen too small. The question whether and to which extent similar inequalities hold for generic families of integral operators remains open.
9.4LGApr 15
Complex Interpolation of Matrices with an application to Multi-Manifold LearningAdi Arbel, Stefan Steinerberger, Ronen Talmon
Given two symmetric positive-definite matrices $A, B \in \mathbb{R}^{n \times n}$, we study the spectral properties of the interpolation $A^{1-x} B^x$ for $0 \leq x \leq 1$. The presence of `common structures' in $A$ and $B$, eigenvectors pointing in a similar direction, can be investigated using this interpolation perspective. Generically, exact log-linearity of the operator norm $\|A^{1-x} B^x\|$ is equivalent to the existence of a shared eigenvector in the original matrices; stability bounds show that approximate log-linearity forces principal singular vectors to align with leading eigenvectors of both matrices. These results give rise to and provide theoretical justification for a multi-manifold learning framework that identifies common and distinct latent structures in multiview data.
LGAug 13, 2022
May the force be with youYulan Zhang, Anna C. Gilbert, Stefan Steinerberger
Modern methods in dimensionality reduction are dominated by nonlinear attraction-repulsion force-based methods (this includes t-SNE, UMAP, ForceAtlas2, LargeVis, and many more). The purpose of this paper is to demonstrate that all such methods, by design, come with an additional feature that is being automatically computed along the way, namely the vector field associated with these forces. We show how this vector field gives additional high-quality information and propose a general refinement strategy based on ideas from Morse theory. The efficiency of these ideas is illustrated specifically using t-SNE on synthetic and real-life data sets.
LGJan 24, 2025
Humanity's Last ExamLong Phan, Alice Gatti, Ziwen Han et al. · amazon-science, apple-ml
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
45.5LGApr 23
An effective variant of the Hartigan $k$-means algorithmFrançois Clément, Stefan Steinerberger
The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd's algorithm (1957). It has become clear that Hartigan's algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of $5\%$ -- $10\%$. We point out that a very minor variation of Hartigan's method leads to another $2\%$ -- $5\%$ improvement; the improvement tends to become larger when either dimension or $k$ increase.
SPOct 8, 2025
Time-Frequency Filtering Meets Graph ClusteringMarcelo A. Colominas, Stefan Steinerberger, Hau-Tieng Wu
We show that the problem of identifying different signal components from a time-frequency representation can be equivalently phrased as a graph clustering problem: given a graph $G=(V,E)$ one aims to identify `clusters', subgraphs that are strongly connected and have relatively few connections between them. The graph clustering problem is well studied, we show how these ideas can suggest (many) new ways to identify signal components. Numerical experiments illustrate the ideas.
SPJul 30, 2021
A common variable minimax theorem for graphsRonald R. Coifman, Nicholas F. Marshall, Stefan Steinerberger
Let $\mathcal{G} = \{G_1 = (V, E_1), \dots, G_m = (V, E_m)\}$ be a collection of $m$ graphs defined on a common set of vertices $V$ but with different edge sets $E_1, \dots, E_m$. Informally, a function $f :V \rightarrow \mathbb{R}$ is smooth with respect to $G_k = (V,E_k)$ if $f(u) \sim f(v)$ whenever $(u, v) \in E_k$. We study the problem of understanding whether there exists a nonconstant function that is smooth with respect to all graphs in $\mathcal{G}$, simultaneously, and how to find it if it exists.
LGFeb 25, 2021
t-SNE, Forceful Colorings and Mean Field LimitsYulan Zhang, Stefan Steinerberger
t-SNE is one of the most commonly used force-based nonlinear dimensionality reduction methods. This paper has two contributions: the first is forceful colorings, an idea that is also applicable to other force-based methods (UMAP, ForceAtlas2,...). In every equilibrium, the attractive and repulsive forces acting on a particle cancel out: however, both the size and the direction of the attractive (or repulsive) forces acting on a particle are related to its properties: the force vector can serve as an additional feature. Secondly, we analyze the case of t-SNE acting on a single homogeneous cluster (modeled by affinities coming from the adjacency matrix of a random k-regular graph); we derive a mean-field model that leads to interesting questions in classical calculus of variations. The model predicts that, in the limit, the t-SNE embedding of a single perfectly homogeneous cluster is not a point but a thin annulus of diameter $\sim k^{-1/4} n^{-1/4}$. This is supported by numerical results. The mean field ansatz extends to other force-based dimensionality reduction methods.
LGDec 15, 2020
Neural Collapse with Cross-Entropy LossJianfeng Lu, Stefan Steinerberger
We consider the variational problem of cross-entropy loss with $n$ feature vectors on a unit hypersphere in $\mathbb{R}^d$. We prove that when $d \geq n - 1$, the global minimum is given by the simplex equiangular tight frame, which justifies the neural collapse behavior. We also prove that as $n \rightarrow \infty$ with fixed $d$, the minimizing points will distribute uniformly on the hypersphere and show a connection with the frame potential of Benedetto & Fickus.
NAJul 27, 2020
On the Regularization Effect of Stochastic Gradient Descent applied to Least SquaresStefan Steinerberger
We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k - u$ than the remaining terms: if $x_k - x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.
PRMar 22, 2020
Spectral Clustering Revisited: Information Hidden in the Fiedler VectorAdela DePavia, Stefan Steinerberger
We are interested in the clustering problem on graphs: it is known that if there are two underlying clusters, then the signs of the eigenvector corresponding to the second largest eigenvalue of the adjacency matrix can reliably reconstruct the two clusters. We argue that the vertices for which the eigenvector has the largest and the smallest entries, respectively, are unusually strongly connected to their own cluster and more reliably classified than the rest. This can be regarded as a discrete version of the Hot Spots conjecture and should be useful in applications. We give a rigorous proof for the stochastic block model and several examples.
LGFeb 27, 2020
The Spectral Underpinning of word2vecAriel Jaffe, Yuval Kluger, Ofir Lindenbaum et al.
word2vec due to Mikolov \textit{et al.} (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.
LGFeb 15, 2019
Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisationsDmitry Kobak, George Linderman, Stefan Steinerberger et al.
T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the "crowding problem" of SNE. Here, we develop an efficient implementation of t-SNE for a $t$-distribution kernel with an arbitrary degree of freedom $ν$, with $ν\to\infty$ corresponding to SNE and $ν=1$ corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that $ν<1$ can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.
NAApr 2, 2019
Quadrature Points via Heat Kernel RepulsionJianfeng Lu, Matthias Sachs, Stefan Steinerberger
We discuss the classical problem of how to pick $N$ weighted points on a $d-$dimensional manifold so as to obtain a reasonable quadrature rule $$ \frac{1}{|M|}\int_{M}{f(x) dx} \simeq \frac{1}{N} \sum_{n=1}^{N}{a_i f(x_i)}.$$ This problem, naturally, has a long history; the purpose of our paper is to propose selecting points and weights so as to minimize the energy functional $$ \sum_{i,j =1}^{N}{ a_i a_j \exp\left(-\frac{d(x_i,x_j)^2}{4t}\right) } \rightarrow \min, \quad \mbox{where}~t \sim N^{-2/d},$$ $d(x,y)$ is the geodesic distance and $d$ is the dimension of the manifold. This yields point sets that are theoretically guaranteed, via spectral theoretic properties of the Laplacian $-Δ$, to have good properties. One nice aspect is that the energy functional is universal and independent of the underlying manifold; we show several numerical examples.
MLJun 28, 2018
Recovering Trees with Convex ClusteringEric C. Chi, Stefan Steinerberger
Convex clustering refers, for given $\left\{x_1, \dots, x_n\right\} \subset \mathbb{R}^p$, to the minimization of \begin{eqnarray*} u(γ) & = & \underset{u_1, \dots, u_n }{\arg\min}\;\sum_{i=1}^{n}{\lVert x_i - u_i \rVert^2} + γ\sum_{i,j=1}^{n}{w_{ij} \lVert u_i - u_j\rVert},\\ \end{eqnarray*} where $w_{ij} \geq 0$ is an affinity that quantifies the similarity between $x_i$ and $x_j$. We prove that if the affinities $w_{ij}$ reflect a tree structure in the $\left\{x_1, \dots, x_n\right\}$, then the convex clustering solution path reconstructs the tree exactly. The main technical ingredient implies the following combinatorial byproduct: for every set $\left\{x_1, \dots, x_n \right\} \subset \mathbb{R}^p$ of $n \geq 2$ distinct points, there exist at least $n/6$ points with the property that for any of these points $x$ there is a unit vector $v \in \mathbb{R}^p$ such that, when viewed from $x$, `most' points lie in the direction $v$ \begin{eqnarray*} \frac{1}{n-1}\sum_{i=1 \atop x_i \neq x}^{n}{ \left\langle \frac{x_i - x}{\lVert x_i - x \rVert}, v \right\rangle} & \geq & \frac{1}{4}. \end{eqnarray*}
SPApr 25, 2018
On the Dual Geometry of Laplacian EigenfunctionsAlexander Cloninger, Stefan Steinerberger
We discuss the geometry of Laplacian eigenfunctions $-Δφ= λφ$ on compact manifolds $(M,g)$ and combinatorial graphs $G=(V,E)$. The 'dual' geometry of Laplacian eigenfunctions is well understood on $\mathbb{T}^d$ (identified with $\mathbb{Z}^d$) and $\mathbb{R}^n$ (which is self-dual). The dual geometry is of tremendous role in various fields of pure and applied mathematics. The purpose of our paper is to point out a notion of similarity between eigenfunctions that allows to reconstruct that geometry. Our measure of 'similarity' $ α(φ_λ, φ_μ)$ between eigenfunctions $φ_λ$ and $φ_μ$ is given by a global average of local correlations $$ α(φ_λ, φ_μ)^2 = \| φ_λ φ_μ \|_{L^2}^{-2}\int_{M}{ \left( \int_{M}{ p(t,x,y)( φ_λ(y) - φ_λ(x))( φ_μ(y) - φ_μ(x)) dy} \right)^2 dx},$$ where $p(t,x,y)$ is the classical heat kernel and $e^{-t λ} + e^{-t μ} = 1$. This notion recovers all classical notions of duality but is equally applicable to other (rough) geometries and graphs; many numerical examples in different continuous and discrete settings illustrate the result.
STMar 19, 2018
Numerical Integration on Graphs: where to sample and how to weighGeorge C. Linderman, Stefan Steinerberger
Let $G=(V,E,w)$ be a finite, connected graph with weighted edges. We are interested in the problem of finding a subset $W \subset V$ of vertices and weights $a_w$ such that $$ \frac{1}{|V|}\sum_{v \in V}^{}{f(v)} \sim \sum_{w \in W}{a_w f(w)}$$ for functions $f:V \rightarrow \mathbb{R}$ that are `smooth' with respect to the geometry of the graph. The main application are problems where $f$ is known to somehow depend on the underlying graph but is expensive to evaluate on even a single vertex. We prove an inequality showing that the integration problem can be rewritten as a geometric problem (`the optimal packing of heat balls'). We discuss how one would construct approximate solutions of the heat ball packing problem; numerical examples demonstrate the efficiency of the method.
LGDec 25, 2017
Efficient Algorithms for t-distributed Stochastic Neighborhood EmbeddingGeorge C. Linderman, Manas Rachh, Jeremy G. Hoskins et al.
t-distributed Stochastic Neighborhood Embedding (t-SNE) is a method for dimensionality reduction and visualization that has become widely popular in recent years. Efficient implementations of t-SNE are available, but they scale poorly to datasets with hundreds of thousands to millions of high dimensional data-points. We present Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE), which dramatically accelerates the computation of t-SNE. The most time-consuming step of t-SNE is a convolution that we accelerate by interpolating onto an equispaced grid and subsequently using the fast Fourier transform to perform the convolution. We also optimize the computation of input similarities in high dimensions using multi-threaded approximate nearest neighbors. We further present a modification to t-SNE called "late exaggeration," which allows for easier identification of clusters in t-SNE embeddings. Finally, for datasets that cannot be loaded into the memory, we present out-of-core randomized principal component analysis (oocPCA), so that the top principal components of a dataset can be computed without ever fully loading the matrix, hence allowing for t-SNE of large datasets to be computed on resource-limited machines.
CONov 13, 2017
Randomized Near Neighbor Graphs, Giant Components, and Applications in Data ScienceGeorge C. Linderman, Gal Mishne, Yuval Kluger et al.
If we pick $n$ random points uniformly in $[0,1]^d$ and connect each point to its $k-$nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in $[0,1]^d$ it suffices to connect every point to $ c_{d,1} \log{\log{n}}$ points chosen randomly among its $ c_{d,2} \log{n}-$nearest neighbors to ensure a giant component of size $n - o(n)$ with high probability. This construction yields a much sparser random graph with $\sim n \log\log{n}$ instead of $\sim n \log{n}$ edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the $k-$nearest neighbors, one can often pick $k' \ll k$ random points out of the $k-$nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.
LGJun 8, 2017
Clustering with t-SNE, provablyGeorge C. Linderman, Stefan Steinerberger
t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and visualization method proposed by van der Maaten & Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove that t-SNE is able to recover well-separated clusters; more precisely, we prove that t-SNE in the `early exaggeration' phase, an optimization technique proposed by van der Maaten & Hinton (2008) and van der Maaten (2014), can be rigorously analyzed. As a byproduct, the proof suggests novel ways for setting the exaggeration parameter $α$ and step size $h$. Numerical examples illustrate the effectiveness of these rules: in particular, the quality of embedding of topological structures (e.g. the swiss roll) improves. We also discuss a connection to spectral clustering methods.
SPJun 5, 2017
The Geometry of Nodal Sets and Outlier DetectionXiuyuan Cheng, Gal Mishne, Stefan Steinerberger
Let $(M,g)$ be a compact manifold and let $-Δφ_k = λ_k φ_k$ be the sequence of Laplacian eigenfunctions. We present a curious new phenomenon which, so far, we only managed to understand in a few highly specialized cases: the family of functions $f_N:M \rightarrow \mathbb{R}_{\geq 0}$ $$ f_N(x) = \sum_{k \leq N}{ \frac{1}{\sqrt{λ_k}} \frac{|φ_k(x)|}{\|φ_k\|_{L^{\infty}(M)}}}$$ seems strangely suited for the detection of anomalous points on the manifold. It may be heuristically interpreted as the sum over distances to the nearest nodal line and potentially hints at a new phenomenon in spectral geometry. We give rigorous statements on the unit square $[0,1]^2$ (where minima localize in $\mathbb{Q}^2$) and on Paley graphs (where $f_N$ recovers the geometry of quadratic residues of the underlying finite field $\mathbb{F}_p$). Numerical examples show that the phenomenon seems to arise on fairly generic manifolds.
MLFeb 9, 2017
Stochastic Neighbor Embedding separates well-separated clustersUri Shaham, Stefan Steinerberger
Stochastic Neighbor Embedding and its variants are widely used dimensionality reduction techniques -- despite their popularity, no theoretical results are known. We prove that the optimal SNE embedding of well-separated clusters from high dimensions to any Euclidean space R^d manages to successfully separate the clusters in a quantitative way. The result also applies to a larger family of methods including a variant of t-SNE.
SPNov 9, 2016
On the Diffusion Geometry of Graph Laplacians and ApplicationsXiuyuan Cheng, Manas Rachh, Stefan Steinerberger
We study directed, weighted graphs $G=(V,E)$ and consider the (not necessarily symmetric) averaging operator $$ (\mathcal{L}u)(i) = -\sum_{j \sim_{} i}{p_{ij} (u(j) - u(i))},$$ where $p_{ij}$ are normalized edge weights. Given a vertex $i \in V$, we define the diffusion distance to a set $B \subset V$ as the smallest number of steps $d_{B}(i) \in \mathbb{N}$ required for half of all random walks started in $i$ and moving randomly with respect to the weights $p_{ij}$ to visit $B$ within $d_{B}(i)$ steps. Our main result is that the eigenfunctions interact nicely with this notion of distance. In particular, if $u$ satisfies $\mathcal{L}u = λu$ on $V$ and $$ B = \left\{ i \in V: - \varepsilon \leq u(i) \leq \varepsilon \right\} \neq \emptyset,$$ then, for all $i \in V$, $$ d_{B}(i) \log{\left( \frac{1}{|1-λ|} \right) } \geq \log{\left( \frac{ |u(i)| }{\|u\|_{L^{\infty}}} \right)} - \log{\left(\frac{1}{2} + \varepsilon\right)}.$$ $d_B(i)$ is a remarkably good approximation of $|u|$ in the sense of having very high correlation. The result implies that the classical one-dimensional spectral embedding preserves particular aspects of geometry in the presence of clustered data. We also give a continuous variant of the result which has a connection to the hot spots conjecture.
MLJul 15, 2016
Spectral Echolocation via the Wave EmbeddingAlexander Cloninger, Stefan Steinerberger
Spectral embedding uses eigenfunctions of the discrete Laplacian on a weighted graph to obtain coordinates for an embedding of an abstract data set into Euclidean space. We propose a new pre-processing step of first using the eigenfunctions to simulate a low-frequency wave moving over the data and using both position as well as change in time of the wave to obtain a refined metric to which classical methods of dimensionality reduction can then applied. This is motivated by the behavior of waves, symmetries of the wave equation and the hunting technique of bats. It is shown to be effective in practice and also works for other partial differential equations -- the method yields improved results even for the classical heat equation.
NAOct 1, 2015
On the Discrepancy of Jittered SamplingFlorian Pausinger, Stefan Steinerberger
We study the discrepancy of jittered sampling sets: such a set $\mathcal{P} \subset [0,1]^d$ is generated for fixed $m \in \mathbb{N}$ by partitioning $[0,1]^d$ into $m^d$ axis aligned cubes of equal measure and placing a random point inside each of the $N = m^d$ cubes. We prove that, for $N$ sufficiently large, $$ \frac{1}{10}\frac{d}{N^{\frac{1}{2} + \frac{1}{2d}}} \leq \mathbb{E} D_N^*(\mathcal{P}) \leq \frac{\sqrt{d} (\log{N})^{\frac{1}{2}}}{N^{\frac{1}{2} + \frac{1}{2d}}},$$ where the upper bound with an unspecified constant $C_d$ was proven earlier by Beck. Our proof makes crucial use of the sharp Dvoretzky-Kiefer-Wolfowitz inequality and a suitably taylored Bernstein inequality; we have reasons to believe that the upper bound has the sharp scaling in $N$. Additional heuristics suggest that jittered sampling should be able to improve known bounds on the inverse of the star-discrepancy in the regime $N \gtrsim d^d$. We also prove a partition principle showing that every partition of $[0,1]^d$ combined with a jittered sampling construction gives rise to a set whose expected squared $L^2-$discrepancy is smaller than that of purely random points.
NADec 7, 2014
A Remark on Disk Packings and Numerical Integration of Harmonic FunctionsStefan Steinerberger
We are interested in the following problem: given an open, bounded domain $Ω\subset \mathbb{R}^2$, what is the largest constant $α= α(Ω) > 0$ such that there exist an infinite sequence of disks $B_1, B_2, \dots, B_N, \dots \subset \mathbb{R}^2$ and a sequence $(n_i)$ with $n_i \in \left\{1,2\right\}$ such that $$ \sup_{N \in \mathbb{N}}{N^α\left\| χ_Ω - \sum_{i=1}^{N}{(-1)^{n_i}χ_{B_i}}\right\|_{L^1(\mathbb{R}^2)}} < \infty,$$ where $χ$ denotes the characteristic function? We prove that certain (somewhat peculiar) domains $Ω\subset \mathbb{R}^2$ satisfy the property with $α= 0.53$. For these domains there exists a sequence of points $(x_i)_{i=1}^{\infty}$ in $Ω$ with weights $(a_i)_{i=1}^{\infty}$ such that for all harmonic functions $u:\mathbb{R}^2 \rightarrow \mathbb{R}$ $$ \left|\int_Ω{u(x)dx} - \sum_{i=1}^{N}{a_i u(x_i)}\right| \leq C_Ω\frac{\|u\|_{L^{\infty}(Ω)}}{N^{0.53}},$$ where $C_Ω$ depends only on $Ω$. This gives a Quasi-Monte-Carlo method for harmonic functions which improves on the probabilistic Monte-Carlo bound $\|u\|_{L^{2}(Ω)}/N^{0.5}$ \textit{without} introducing a dependence on the total variation. We do not know which decay rates are optimal.