Levin Maier

2papers

2 Papers

41.7SGMar 20
Information Geometry via the Q-Root Transform

Levin Maier

In this paper, we introduce \emph{$\ell^p$-information geometry}, an infinite-dimensional framework that shares key features with the geometry of the space of probability densities \( \mathrm{Dens}(M) \) on a closed manifold, while also incorporating aspects of measure-valued information geometry. We define the \emph{$\ell^2$-probability simplex} with a noncanonical differentiable structure induced via the \emph{$q$-root transform} from an open subset of the \( \ell^q \)-sphere. This choice makes the \(q\)-root transform an \emph{isometry} and allows us to construct the \(\ell^2\)- and \(\ell^q\)-Fisher--Rao geometries, including \emph{Amari--Čencov \(α\)-connections} and a \emph{Chern connection} in the \(\ell^q\)-setting. We then apply this framework to an infinite-dimensional linear optimization problem. We show that the corresponding gradient flow with respect to the \(\ell^2\)--Fisher--Rao metric can be solved explicitly, converges to a maximizer under a natural monotonicity assumption, and admits an interpretation as the geodesic flow of an \emph{exponential connection}. In particular, we prove that this \(e\)-connection is \emph{geodesically complete}. We further relate these flows to a \emph{completely integrable Hamiltonian system} through a \emph{momentum map} associated with a Hamiltonian torus action on infinite-dimensional complex projective space. Finally, inspired by the \(\ell^2\)-theory, we outline an analogous Fisher--Rao geometry for \( \mathrm{Dens}(M) \) on possibly noncompact Riemannian manifolds, showing that, with a suitable spherical differentiable structure, the square-root transform remains an \emph{isometry}.

DCFeb 9
Mathematical Foundations of Modeling ETL Process Chains

Levin Maier, Lucas Schulze, Robert Lilow et al.

Extract-Transform-Load (ETL) processes are core components of modern data processing infrastructures. The throughput of processed data records can be adjusted by changing the amount of allocated resources, i.e.~the number of parallel processing threads for each of the three ETL phases, but also depends on stochastic variations in the per-record processing times. In chains of multiple consecutive ETL processes, the relation between allocated resources and overall throughput is further complicated, for example by the occurrence of bottlenecks affecting all subsequent ETL processes. We develop a mathematical model of ETL process chains that is accurate at the level of time-aggregated throughput and suitable for efficient simulation. The process chain is represented as a controlled discrete-time Markov process on a directed acyclic graph whose edges are individual ETL processes. We model the mean throughput as a bounded, monotone function of the number of parallel threads, to capture the diminishing benefit of allocating more threads. We furthermore introduce a Flow Balance postulate linking number of threads, mean throughput, and mean processing time. The stochastic processing times are then modeled by non-negative heavy-tailed distributions around the mean processing time. This framework provides a principled simulator for ETL networks and a foundation for learning- and control-based resource allocation.