Daniel Zantedeschi

ML
3papers
1citation
Novelty73%
AI Score47

3 Papers

MLMar 2
Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits

Daniel Zantedeschi, Kumar Muthuraman

We develop a Fisher-geometric theory of stochastic gradient descent (SGD) in which mini-batch noise is an intrinsic, loss-induced matrix -- not an exogenous scalar variance. Under exchangeable sampling, the mini-batch gradient covariance is pinned down (to leading order) by the projected covariance of per-sample gradients: it equals projected Fisher information for well-specified likelihood losses and the projected Godambe (sandwich) matrix for general M-estimation losses. This identification forces a diffusion approximation with Fisher/Godambe-structured volatility (effective temperature tau = eta/b) and yields an Ornstein-Uhlenbeck linearization whose stationary covariance is given in closed form by a Fisher-Lyapunov equation. Building on this geometry, we prove matching minimax upper and lower bounds of order Theta(1/N) for Fisher/Godambe risk under a total oracle budget N; the lower bound holds under a martingale oracle condition (bounded predictable quadratic variation), strictly subsuming i.i.d. and exchangeable sampling. These results imply oracle-complexity guarantees for epsilon-stationarity in the Fisher dual norm that depend on an intrinsic effective dimension and a Fisher/Godambe condition number rather than ambient dimension or Euclidean conditioning. Experiments confirm the Lyapunov predictions and show that scalar temperature matching cannot reproduce directional noise structure.

97.1ITMar 26
Kakeya Conjecture and Conditional Kolmogorov Complexity

Nicholas G. Polson, Daniel Zantedeschi

This paper develops an information-theoretic framework for algorithmic complexity under regular identifiable fibering. The central question is: when a decoder is given information about the fiber label in a fibered geometric set, how much can the residual description length be reduced, and when does this reduction fail to bring dimension below the ambient rate? We formulate a directional compression principle, proposing that sets admitting regular, identifiable fiber decompositions should remain informationally incompressible at ambient dimension, unless the fiber structure is degenerate or adaptively chosen. The principle is phrased in the language of algorithmic dimension and the point-to-set principle of Lutz and Lutz, which translates pointwise Kolmogorov complexity into Hausdorff dimension. We prove an exact analytical result: under effectively bi-Lipschitz, identifiable, and computable fibering, the complexity of a point splits additively as the sum of fiber-label complexity and along-fiber residual complexity, up to logarithmic overhead, via the chain rule for Kolmogorov complexity. The Kakeya conjecture (asserting that sets containing a unit segment in every direction have Hausdorff dimension n) motivates the framework. The conjecture was recently resolved in R^3 by Wang and Zahl; it remains open in dimension n >= 4, precisely because adaptive fiber selection undermines the naive conditional split in the general case. We isolate this adaptive-fibering obstruction as the key difficulty and propose a formal research program connecting geometric measure theory, algorithmic complexity, and information-theoretic compression.

MLMar 5
Bayes with No Shame: Admissibility Geometries of Predictive Inference

Nicholas G. Polson, Daniel Zantedeschi

Four distinct admissibility geometries govern sequential and distribution-free inference: Blackwell risk dominance over convex risk sets, anytime-valid admissibility within the nonnegative supermartingale cone, marginal coverage validity over exchangeable prediction sets, and Cesàro approachability (CAA) admissibility, which reaches the risk-set boundary via approachability-style arguments rather than explicit priors. We prove a criterion separation theorem: the four classes of admissible procedures are pairwise non-nested. Each geometry carries a different certificate of optimality: a supporting-hyperplane prior (Blackwell), a nonnegative supermartingale (anytime-valid), an exchangeability rank (coverage), or a Cesàro steering argument (CAA). Martingale coherence is necessary for Blackwell admissibility and necessary and sufficient for anytime-valid admissibility within e-processes, but is not sufficient for Blackwell admissibility and is not necessary for coverage validity or CAA-admissibility. All four criteria share a common optimization template (minimize Bayesian risk subject to a feasibility constraint), but the constraint sets operate over different spaces, partial orders, and performance metrics, making them geometrically incompatible. Admissibility is irreducibly criterion-relative.