CVDec 10, 2022
An approach to robust ICP initializationAlexander Kolpakov, Michael Werman
In this note, we propose an approach to initialize the Iterative Closest Point (ICP) algorithm to match unlabelled point clouds related by rigid transformations. The method is based on matching the ellipsoids defined by the points' covariance matrices and then testing the various principal half-axes matchings that differ by elements of a finite reflection group. We derive bounds on the robustness of our approach to noise and numerical experiments confirm our theoretical findings.
CVMar 5, 2023
Robust affine point matching via quadratic assignment on GrassmanniansAlexander Kolpakov, Michael Werman
Robust Affine Matching with Grassmannians (RoAM) is a new algorithm to perform affine registration of point clouds. The algorithm is based on minimizing the Frobenius distance between two elements of the Grassmannian. For this purpose, an indefinite relaxation of the Quadratic Assignment Problem (QAP) is used, and several approaches to affine feature matching are studied and compared. Experiments demonstrate that RoAM is more robust to noise and point discrepancy than previous methods.
LGApr 28
DiRe-RAPIDS: Topology-faithful dimensionality reduction at scaleAlexander Kolpakov, Igor Rivin
Dimensionality reduction methods such as UMAP and t-SNE are central tools for visualising high-dimensional data, but their local-neighborhood objectives can preserve sampling noise while distorting global topology. We show that standard local metrics reward this noise memorisation: top-performing embeddings invent cycles and disconnected islands absent from the data. We introduce a topology-faithfulness benchmark based on noisy manifolds with known homology, tune DiRe against it, and find Pareto-optimal configurations that match or beat GPU-accelerated UMAP on classification while recovering exact first Betti numbers on stress tests. On 723K arXiv paper embeddings, DiRe preserves 3-4 times more topological structure than UMAP at comparable wall-clock.
SIJun 9, 2025
Fast Geometric Embedding for Node Influence MaximizationAlexander Kolpakov, Igor Rivin
Computing classical centrality measures such as betweenness and closeness is computationally expensive on large-scale graphs. In this work, we introduce an efficient force layout algorithm that embeds a graph into a low-dimensional space, where the radial distance from the origin serves as a proxy for various centrality measures. We evaluate our method on multiple graph families and demonstrate strong correlations with degree, PageRank, and paths-based centralities. As an application, it turns out that the proposed embedding allows to find high-influence nodes in a network, and provides a fast and scalable alternative to the standard greedy algorithm.
ITJul 17, 2025
Loss-Complexity Landscape and Model Structure FunctionsAlexander Kolpakov
We develop a framework for dualizing the Kolmogorov structure function $h_x(α)$, which then allows using computable complexity proxies. We establish a mathematical analogy between information-theoretic constructs and statistical mechanics, introducing a suitable partition function and free energy functional. We explicitly prove the Legendre-Fenchel duality between the structure function and free energy, showing detailed balance of the Metropolis kernel, and interpret acceptance probabilities as information-theoretic scattering amplitudes. A susceptibility-like variance of model complexity is shown to peak precisely at loss-complexity trade-offs interpreted as phase transitions. Practical experiments with linear and tree-based regression models verify these theoretical predictions, explicitly demonstrating the interplay between the model complexity, generalization, and overfitting threshold.
LGMar 5, 2025
Dimensionality reduction for homological stability and global structure preservationAlexander Kolpakov, Igor Rivin
We propose a new dimensionality reduction toolkit designed to address some of the challenges faced by traditional methods like UMAP and tSNE such as loss of global structure and computational efficiency. Built on the JAX framework, DiRe leverages modern hardware acceleration to provide an efficient, scalable, and interpretable solution for visualizing complex data structures, and for quantitative analysis of lower-dimensional embeddings. The toolkit shows considerable promise in preserving both local and global structures within the data as compared to state-of-the-art UMAP and tSNE implementations. This makes it suitable for a wide range of applications in machine learning, bio-informatics, and data science.
ITMar 19, 2024
Machine Learning of the Prime DistributionAlexander Kolpakov, Aidan Rocke
In the present work we use maximum entropy methods to derive several theorems in probabilistic number theory, including a version of the Hardy-Ramanujan Theorem. We also provide a theoretical argument explaining the experimental observations of Yang-Hui He about the learnability of primes, and posit that the Erdős-Kac law would very unlikely be discovered by current machine learning techniques. Numerical experiments that we perform corroborate our theoretical findings.