LGSep 6, 2022
Use and Misuse of Machine Learning in AnthropologyJeff Calder, Reed Coil, Annie Melton et al.
Machine learning (ML), being now widely accessible to the research community at large, has fostered a proliferation of new and striking applications of these emergent mathematical techniques across a wide range of disciplines. In this paper, we will focus on a particular case study: the field of paleoanthropology, which seeks to understand the evolution of the human species based on biological and cultural evidence. As we will show, the easy availability of ML algorithms and lack of expertise on their proper use among the anthropological research community has led to foundational misapplications that have appeared throughout the literature. The resulting unreliable results not only undermine efforts to legitimately incorporate ML into anthropological research, but produce potentially faulty understandings about our human evolutionary and behavioral past. The aim of this paper is to provide a brief introduction to some of the ways in which ML has been applied within paleoanthropology; we also include a survey of some basic ML algorithms for those who are not fully conversant with the field, which remains under active development. We discuss a series of missteps, errors, and violations of correct protocols of ML methods that appear disconcertingly often within the accumulating body of anthropological literature. These mistakes include use of outdated algorithms and practices; inappropriate train/test splits, sample composition, and textual explanations; as well as an absence of transparency due to the lack of data/code sharing, and the subsequent limitations imposed on independent replication. We assert that expanding samples, sharing data and code, re-evaluating approaches to peer review, and, most importantly, developing interdisciplinary teams that include experts in ML are all necessary for progress in future research incorporating ML within anthropology.
LGJul 19, 2023
Novel Batch Active Learning Approach and Its Application to Synthetic Aperture Radar DatasetsJames Chapman, Bohan Chen, Zheng Tan et al.
Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning selects a query set of size one while batch active learning selects a query set of multiple datapoints. While batch active learning methods exhibit greater efficiency, the challenge lies in maintaining model accuracy relative to sequential active learning methods. We developed a novel, two-part approach for batch active learning: Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling. The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size. As an application, a pipeline is built based on transfer learning feature embedding, graph learning, DAC, and LocalMax to classify the FUSAR-Ship and OpenSARShip datasets. Our pipeline outperforms the state-of-the-art CNN-based methods.
CVMay 5, 2022
The Batch Artifact Scanning Protocol: A new method using computed tomography (CT) to rapidly create three-dimensional models of objects from large collections en masseKatrina Yezzi-Woodley, Jeff Calder, Mckenzie Sweno et al.
Within anthropology, the use of three-dimensional (3D) imaging has become increasingly common and widespread since it broadens the available avenues for addressing a wide range of key anthropological issues. The ease with which 3D models can be generated and shared has major impact on research, cultural heritage, education, science communication, and public engagement, as well as contributing to the preservation of the physical specimens and archiving collections in widely accessible data bases. Current scanning protocols have the ability to create the required research quality 3D models; however, they tend to be time and labor intensive and not practical when working with large collections. Here we describe a streamlined Batch Artifact Scanning Protocol to rapidly create 3D models using a medical CT scanner. While this method can be used on a variety of material types, we have, for specificity, applied our protocol to a large collection of experimentally broken ungulate limb bones. By employing the Batch Artifact Scanning Protocol, we were able to efficiently create 3D models of 2,474 bone fragments at a rate of less than 4 minutes per specimen.
CVMay 20, 2022
Using machine learning on new feature sets extracted from 3D models of broken animal bones to classify fragments according to break agentKatrina Yezzi-Woodley, Alexander Terwilliger, Jiafeng Li et al.
Distinguishing agents of bone modification at paleoanthropological sites is at the root of much of the research directed at understanding early hominin exploitation of large animal resources and the effects those subsistence behaviors had on early hominin evolution. However, current methods, particularly in the area of fracture pattern analysis as a signal of marrow exploitation, have failed to overcome equifinality. Furthermore, researchers debate the replicability and validity of current and emerging methods for analyzing bone modifications. Here we present a new approach to fracture pattern analysis aimed at distinguishing bone fragments resulting from hominin bone breakage and those produced by carnivores. This new method uses 3D models of fragmentary bone to extract a much richer dataset that is more transparent and replicable than feature sets previously used in fracture pattern analysis. Supervised machine learning algorithms are properly used to classify bone fragments according to agent of breakage with average mean accuracy of 77% across tests.
PRFeb 22, 2024
Ratio convergence rates for Euclidean first-passage percolation: Applications to the graph infinity LaplacianLeon Bungert, Jeff Calder, Tim Roith
In this paper we prove the first quantitative convergence rates for the graph infinity Laplace equation for length scales at the connectivity threshold. In the graph-based semi-supervised learning community this equation is also known as Lipschitz learning. The graph infinity Laplace equation is characterized by the metric on the underlying space, and convergence rates follow from convergence rates for graph distances. At the connectivity threshold, this problem is related to Euclidean first passage percolation, which is concerned with the Euclidean distance function $d_{h}(x,y)$ on a homogeneous Poisson point process on $\mathbb{R}^d$, where admissible paths have step size at most $h>0$. Using a suitable regularization of the distance function and subadditivity we prove that ${d_{h_s}(0,se_1)}/ s \to σ$ as $s\to\infty$ almost surely where $σ\geq 1$ is a dimensional constant and $h_s\gtrsim \log(s)^\frac{1}{d}$. A convergence rate is not available due to a lack of approximate superadditivity when $h_s\to \infty$. Instead, we prove convergence rates for the ratio $\frac{d_{h}(0,se_1)}{d_{h}(0,2se_1)}\to \frac{1}{2}$ when $h$ is frozen and does not depend on $s$. Combining this with the techniques that we developed in (Bungert, Calder, Roith, IMA Journal of Numerical Analysis, 2022), we show that this notion of ratio convergence is sufficient to establish uniform convergence rates for solutions of the graph infinity Laplace equation at percolation length scales.
MLOct 27, 2022
Poisson Reweighted Laplacian Uncertainty Sampling for Graph-based Active LearningKevin Miller, Jeff Calder
We show that uncertainty sampling is sufficient to achieve exploration versus exploitation in graph-based active learning, as long as the measure of uncertainty properly aligns with the underlying model and the model properly reflects uncertainty in unexplored regions. In particular, we use a recently developed algorithm, Poisson ReWeighted Laplace Learning (PWLL) for the classifier and we introduce an acquisition function designed to measure uncertainty in this graph-based classifier that identifies unexplored regions of the data. We introduce a diagonal perturbation in PWLL which produces exponential localization of solutions, and controls the exploration versus exploitation tradeoff in active learning. We use the well-posed continuum limit of PWLL to rigorously analyze our method, and present experimental results on a number of graph-based image classification problems.
85.0NAMay 16
Dynamics Over Landscape: The Emergence of Linear Separability via Spectral Alignment in Contrastive LearningJeff Calder, Wonjun Lee
Contrastive learning effectively clusters data despite a loss landscape filled with poor solutions, a success that is heavily dependent on the choice of data augmentations. How optimization consistently finds meaningful patterns remains an open question. We show this success stems from training dynamics rather than the loss function alone. Crucially, under a highly specific structural assumption governing the connectivity and variance of the data augmentations, we prove that once a critical spectral alignment threshold is reached, data features inevitably and rapidly separate into distinct clusters. We establish this mechanism for both discrete datasets and the macroscopic continuum limit, modeling latent dynamics as a Wasserstein gradient flow to demonstrate that this separation persists as the number of data points approaches infinity. We hypothesize that natural training dynamics inherently drive the system toward this critical state. We extensively validate this empirically across four diverse domains (synthetic shapes, images, text, and PDEs). In every setting, a sharp increase in this spectral quantity consistently precedes clean data separation, revealing that contrastive learning's success is governed by a dynamically emerging trigger tightly coupled to the underlying augmentation structure.
NADec 5, 2017
High-order filtered schemes for the Hamilton-Jacobi continuum limit of nondominated sortingWarut Thawinrak, Jeff Calder
We investigate high-order finite difference schemes for the Hamilton-Jacobi equation continuum limit of nondominated sorting. Nondominated sorting is an algorithm for sorting points in Euclidean space into layers by repeatedly removing minimal elements. It is widely used in multi-objective optimization, which finds applications in many scientific and engineering contexts, including machine learning. In this paper, we show how to construct filtered schemes, which combine high order possibly unstable schemes with first order monotone schemes in a way that guarantees stability and convergence while enjoying the additional accuracy of the higher order scheme in regions where the solution is smooth. We prove that our filtered schemes are stable and converge to the viscosity solution of the Hamilton-Jacobi equation, and we provide numerical simulations to investigate the rate of convergence of the new schemes.
43.7MLApr 13
On the continuum limit of t-SNE for data visualizationJeff Calder, Zhonggan Huang, Ryan Murray et al.
This work is concerned with the continuum limit of a graph-based data visualization technique called the t-Distributed Stochastic Neighbor Embedding (t-SNE), which is widely used for visualizing data in a variety of applications, but is still poorly understood from a theoretical standpoint. The t-SNE algorithm produces visualizations by minimizing the Kullback-Leibler divergence between similarity matrices representing the high dimensional data and its low dimensional representation. We prove that as the number of data points $n \to \infty$, after a natural rescaling and in applicable parameter regimes, the Kullback-Leibler divergence is consistent as the number of data points $n \to \infty$ and the similarity graph remains sparse with a continuum variational problem that involves a non-convex gradient regularization term and a penalty on the magnitude of the probability density function in the visualization space. These two terms represent the continuum limits of the attraction and repulsion forces in the t-SNE algorithm. Due to the lack of convexity in the continuum variational problem, the question of well-posedeness is only partially resolved. We show that when both dimensions are $1$, the problem admits a unique smooth minimizer, along with an infinite number of discontinuous minimizers (interpreted in a relaxed sense). This aligns well with the empirically observed ability of t-SNE to separate data in seemingly arbitrary ways in the visualization. The energy is also very closely related to the famously ill-posed Perona-Malik equation, which is used for denoising and simplifying images. We present numerical results validating the continuum limit, provide some preliminary results about the delicate nature of the limiting energetic problem in higher dimensions, and highlight several problems for future work.
APJul 9, 2024
Convergence rates for Poisson learning to a Poisson equation with measure dataLeon Bungert, Jeff Calder, Max Mihailescu et al.
In this paper we prove discrete to continuum convergence rates for Poisson Learning, a graph-based semi-supervised learning algorithm that is based on solving the graph Poisson equation with a source term consisting of a linear combination of Dirac deltas located at labeled points and carrying label information. The corresponding continuum equation is a Poisson equation with measure data in a Euclidean domain $Ω\subset \mathbb{R}^d$. The singular nature of these equations is challenging and requires an approach with several distinct parts: (1) We prove quantitative error estimates when convolving the measure data of a Poisson equation with (approximately) radial function supported on balls. (2) We use quantitative variational techniques to prove discrete to continuum convergence rates on random geometric graphs with bandwidth $\varepsilon>0$ for bounded source terms. (3) We show how to regularize the graph Poisson equation via mollification with the graph heat kernel, and we study fine asymptotics of the heat kernel on random geometric graphs. Combining these three pillars we obtain $L^1$ convergence rates that scale, up to logarithmic factors, like $O(\varepsilon^{\frac{1}{d+2}})$ for general data distributions, and $O(\varepsilon^{\frac{2-σ}{d+4}})$ for uniformly distributed data, where $σ>0$. These rates are valid with high probability if $\varepsilon\gg\left({\log n}/{n}\right)^q$ where $n$ denotes the number of vertices of the graph and $q \approx \frac{1}{3d}$.
NANov 10, 2020Code
The Virtual Goniometer: A new method for measuring angles on 3D models of fragmentary bone and lithicsKatrina Yezzi-Woodley, Jeff Calder, Peter J. Olver et al.
The contact goniometer is a commonly used tool in lithic and zooarchaeological analysis, despite suffering from a number of shortcomings due to the physical interaction between the measuring implement, the object being measured, and the individual taking the measurements. However, lacking a simple and efficient alternative, researchers in a variety of fields continue to use the contact goniometer to this day. In this paper, we present a new goniometric method that we call the virtual goniometer, which takes angle measurements virtually on a 3D model of an object. The virtual goniometer allows for rapid data collection, and for the measurement of many angles that cannot be physically accessed by a manual goniometer. We compare the intra-observer variability of the manual and virtual goniometers, and find that the virtual goniometer is far more consistent and reliable. Furthermore, the virtual goniometer allows for precise replication of angle measurements, even among multiple users, which is important for reproducibility of goniometric-based research. The virtual goniometer is available as a plug-in in the open source mesh processing packages Meshlab and Blender, making it easily accessible to researchers exploring the potential for goniometry to improve archaeological methods and address anthropological questions.
STJan 15, 2024
Consistency of semi-supervised learning, stochastic tug-of-war games, and the p-LaplacianJeff Calder, Nadejda Drenska
In this paper we give a broad overview of the intersection of partial differential equations (PDEs) and graph-based semi-supervised learning. The overview is focused on a large body of recent work on PDE continuum limits of graph-based learning, which have been used to prove well-posedness of semi-supervised learning algorithms in the large data limit. We highlight some interesting research directions revolving around consistency of graph-based semi-supervised learning, and present some new results on the consistency of $p$-Laplacian semi-supervised learning using the stochastic tug-of-war game interpretation of the $p$-Laplacian. We also present the results of some numerical experiments that illustrate our results and suggest directions for future work.
NAJan 16, 2025
Geometry-Preserving Encoder/Decoder in Latent Generative ModelsWonjun Lee, Riley C. W. O'Neill, Dongmian Zou et al.
Generative modeling aims to generate new data samples that resemble a given dataset, with diffusion models recently becoming the most popular generative model. One of the main challenges of diffusion models is solving the problem in the input space, which tends to be very high-dimensional. Recently, solving diffusion models in the latent space through an encoder that maps from the data space to a lower-dimensional latent space has been considered to make the training process more efficient and has shown state-of-the-art results. The variational autoencoder (VAE) is the most commonly used encoder/decoder framework in this domain, known for its ability to learn latent representations and generate data samples. In this paper, we introduce a novel encoder/decoder framework with theoretical properties distinct from those of the VAE, specifically designed to preserve the geometric structure of the data distribution. We demonstrate the significant advantages of this geometry-preserving encoder in the training process of both the encoder and decoder. Additionally, we provide theoretical results proving convergence of the training process, including convergence guarantees for encoder training, and results showing faster convergence of decoder training when using the geometry-preserving encoder.
NAOct 29, 2025
Energy Approach from $\varepsilon$-Graph to Continuum Diffusion Model with Connectivity FunctionalYahong Yang, Sun Lee, Jeff Calder et al.
We derive an energy-based continuum limit for $\varepsilon$-graphs endowed with a general connectivity functional. We prove that the discrete energy and its continuum counterpart differ by at most $O(\varepsilon)$; the prefactor involves only the $W^{1,1}$-norm of the connectivity density as $\varepsilon\to0$, so the error bound remains valid even when that density has strong local fluctuations. As an application, we introduce a neural-network procedure that reconstructs the connectivity density from edge-weight data and then embeds the resulting continuum model into a brain-dynamics framework. In this setting, the usual constant diffusion coefficient is replaced by the spatially varying coefficient produced by the learned density, yielding dynamics that differ significantly from those obtained with conventional constant-diffusion models.
LGDec 11, 2024
GLL: A Differentiable Graph Learning Layer for Neural NetworksJason Brown, Bohan Chen, Harris Hardiman-Mostow et al.
Standard deep learning architectures used for classification generate label predictions with a projection head and softmax activation function. Although successful, these methods fail to leverage the relational information between samples in the batch for generating label predictions. In recent works, graph-based learning techniques, namely Laplace learning, have been heuristically combined with neural networks for both supervised and semi-supervised learning (SSL) tasks. However, prior works approximate the gradient of the loss function with respect to the graph learning algorithm or decouple the processes; end-to-end integration with neural networks is not achieved. In this work, we derive backpropagation equations, via the adjoint method, for inclusion of a general family of graph learning layers into a neural network. This allows us to precisely integrate graph Laplacian-based label propagation into a neural network layer, replacing a projection head and softmax activation function for classification tasks. Using this new framework, our experimental results demonstrate smooth label transitions across data, improved robustness to adversarial attacks, improved generalization, and improved training dynamics compared to the standard softmax-based approach.
LGNov 15, 2024
Attraction-Repulsion Swarming: A Generalized Framework of t-SNE via Force Normalization and Tunable InteractionsJingcheng Lu, Jeff Calder
We propose a new method for data visualization based on attraction-repulsion swarming (ARS) dynamics, which we call ARS visualization. ARS is a generalized framework that is based on viewing the t-distributed stochastic neighbor embedding (t-SNE) visualization technique as a swarm of interacting agents driven by attraction and repulsion. Motivated by recent developments in swarming, we modify the t-SNE dynamics to include a normalization by the \emph{total influence}, which results in better posed dynamics in which we can use a data size independent time step (of $h=1$) and a simple iteration, without the need for the array of optimization tricks employed in t-SNE. ARS also includes the ability to separately tune the attraction and repulsion kernels, which gives the user control over the tightness within clusters and the spacing between them in the visualization. In contrast with t-SNE, our proposed ARS data visualization method is not gradient descent on the Kullback-Leibler divergence, and can be viewed solely as an interacting particle system driven by attraction and repulsion forces. We provide theoretical results illustrating how the choice of interaction kernel affects the dynamics, and experimental results to validate our method and compare to t-SNE on the MNIST and Cifar-10 data sets.
NAJun 9, 2024
Numerical solution of a PDE arising from prediction with expert adviceJeff Calder, Nadejda Drenska, Drisana Mosaphir
This work investigates the online machine learning problem of prediction with expert advice in an adversarial setting through numerical analysis of, and experiments with, a related partial differential equation. The problem is a repeated two-person game involving decision-making at each step informed by $n$ experts in an adversarial environment. The continuum limit of this game over a large number of steps is a degenerate elliptic equation whose solution encodes the optimal strategies for both players. We develop numerical methods for approximating the solution of this equation in relatively high dimensions ($n\leq 10$) by exploiting symmetries in the equation and the solution to drastically reduce the size of the computational domain. Based on our numerical results we make a number of conjectures about the optimality of various adversarial strategies, in particular about the non-optimality of the COMB strategy.
LGMar 31, 2022
Graph-based Active Learning for Semi-supervised Classification of SAR DataKevin Miller, John Mauro, Jason Setiadi et al.
We present a novel method for classification of Synthetic Aperture Radar (SAR) data by combining ideas from graph-based learning and neural network methods within an active learning framework. Graph-based methods in machine learning are based on a similarity graph constructed from the data. When the data consists of raw images composed of scenes, extraneous information can make the classification task more difficult. In recent years, neural network methods have been shown to provide a promising framework for extracting patterns from SAR images. These methods, however, require ample training data to avoid overfitting. At the same time, such training data are often unavailable for applications of interest, such as automatic target recognition (ATR) and SAR data. We use a Convolutional Neural Network Variational Autoencoder (CNNVAE) to embed SAR data into a feature space, and then construct a similarity graph from the embedded data and apply graph-based semi-supervised learning techniques. The CNNVAE feature embedding and graph construction requires no labeled data, which reduces overfitting and improves the generalization performance of graph learning at low label rates. Furthermore, the method easily incorporates a human-in-the-loop for active learning in the data-labeling process. We present promising results and compare them to other standard machine learning methods on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset for ATR with small amounts of labeled data.
APFeb 17, 2022
Hamilton-Jacobi equations on graphs with applications to semi-supervised learning and data depthJeff Calder, Mahmood Ettehad
Shortest path graph distances are widely used in data science and machine learning, since they can approximate the underlying geodesic distance on the data manifold. However, the shortest path distance is highly sensitive to the addition of corrupted edges in the graph, either through noise or an adversarial perturbation. In this paper we study a family of Hamilton-Jacobi equations on graphs that we call the $p$-eikonal equation. We show that the $p$-eikonal equation with $p=1$ is a provably robust distance-type function on a graph, and the $p\to \infty$ limit recovers shortest path distances. While the $p$-eikonal equation does not correspond to a shortest-path graph distance, we nonetheless show that the continuum limit of the $p$-eikonal equation on a random geometric graph recovers a geodesic density weighted distance in the continuum. We consider applications of the $p$-eikonal equation to data depth and semi-supervised learning, and use the continuum limit to prove asymptotic consistency results for both applications. Finally, we show the results of experiments with data depth and semi-supervised learning on real image datasets, including MNIST, FashionMNIST and CIFAR-10, which show that the $p$-eikonal equation offers significantly better results compared to shortest path distances.
NANov 24, 2021
Uniform Convergence Rates for Lipschitz Learning on GraphsLeon Bungert, Jeff Calder, Tim Roith
Lipschitz learning is a graph-based semi-supervised learning method where one extends labels from a labeled to an unlabeled data set by solving the infinity Laplace equation on a weighted graph. In this work we prove uniform convergence rates for solutions of the graph infinity Laplace equation as the number of vertices grows to infinity. Their continuum limits are absolutely minimizing Lipschitz extensions with respect to the geodesic metric of the domain where the graph vertices are sampled from. We work under very general assumptions on the graph weights, the set of labeled vertices, and the continuum domain. Our main contribution is that we obtain quantitative convergence rates even for very sparsely connected graphs, as they typically appear in applications like semi-supervised learning. In particular, our framework allows for graph bandwidths down to the connectivity radius. For proving this we first show a quantitative convergence statement for graph distance functions to geodesic distance functions in the continuum. Using the "comparison with distance functions" principle, we can pass these convergence statements to infinity harmonic functions and absolutely minimizing Lipschitz extensions.
NANov 5, 2021
Boundary Estimation from Point Clouds: Algorithms, Guarantees and ApplicationsJeff Calder, Sangmin Park, Dejan Slepčev
We investigate identifying the boundary of a domain from sample points in the domain. We introduce new estimators for the normal vector to the boundary, distance of a point to the boundary, and a test for whether a point lies within a boundary strip. The estimators can be efficiently computed and are more accurate than the ones present in the literature. We provide rigorous error estimates for the estimators. Furthermore we use the detected boundary points to solve boundary-value problems for PDE on point clouds. We prove error estimates for the Laplace and eikonal equations on point clouds. Finally we provide a range of numerical experiments illustrating the performance of our boundary estimators, applications to PDE on point clouds, and tests on image data sets.
OCAug 31, 2020
Asymptotically optimal strategies for online prediction with history-dependent expertsJeff Calder, Nadejda Drenska
We establish sharp asymptotically optimal strategies for the problem of online prediction with history dependent experts. The prediction problem is played (in part) over a discrete graph called the $d$ dimensional de Bruijn graph, where $d$ is the number of days of history used by the experts. Previous work [11] established $O(\varepsilon)$ optimal strategies for $n=2$ experts and $d\leq 4$ days of history, while [10] established $O(\varepsilon^{1/3})$ optimal strategies for all $n\geq 2$ and all $d\geq 1$, where the game is played for $N$ steps and $\varepsilon=N^{-1/2}$. In this paper, we show that the optimality conditions over the de Bruijn graph correspond to a graph Poisson equation, and we establish $O(\varepsilon)$ optimal strategies for all values of $n$ and $d$.
APJul 31, 2020
Online Prediction With History-Dependent Experts: The General CaseNadejda Drenska, Jeff Calder
We study the problem of prediction of binary sequences with expert advice in the online setting, which is a classic example of online machine learning. We interpret the binary sequence as the price history of a stock, and view the predictor as an investor, which converts the problem into a stock prediction problem. In this framework, an investor, who predicts the daily movements of a stock, and an adversarial market, who controls the stock, play against each other over $N$ turns. The investor combines the predictions of $n\geq 2$ experts in order to make a decision about how much to invest at each turn, and aims to minimize their regret with respect to the best-performing expert at the end of the game. We consider the problem with history-dependent experts, in which each expert uses the previous $d$ days of history of the market in making their predictions. We prove that the value function for this game, rescaled appropriately, converges as $N\to \infty$ at a rate of $O(N^{-1/6})$ to the viscosity solution of a nonlinear degenerate elliptic PDE, which can be understood as the Hamilton-Jacobi-Issacs equation for the two-person game. As a result, we are able to deduce asymptotically optimal strategies for the investor. Our results extend those established by the first author and R.V.Kohn [13] for $n=2$ experts and $d\leq 4$ days of history. To appear in Communications on Pure and Applied Mathematics.
APJul 13, 2020
Lipschitz regularity of graph Laplacians on random data cloudsJeff Calder, Nicolas Garcia Trillos, Marta Lewicka
In this paper we study Lipschitz regularity of elliptic PDEs on geometric graphs, constructed from random data points. The data points are sampled from a distribution supported on a smooth manifold. The family of equations that we study arises in data analysis in the context of graph-based learning and contains, as important examples, the equations satisfied by graph Laplacian eigenvectors. In particular, we prove high probability interior and global Lipschitz estimates for solutions of graph Poisson equations. Our results can be used to show that graph Laplacian eigenvectors are, with high probability, essentially Lipschitz regular with constants depending explicitly on their corresponding eigenvalues. Our analysis relies on a probabilistic coupling argument of suitable random walks at the continuum level, and an interpolation method for extending functions on random point clouds to the continuum manifold. As a byproduct of our general regularity results, we obtain high probability $L^\infty$ and approximate $\mathcal{C}^{0,1}$ convergence rates for the convergence of graph Laplacian eigenvectors towards eigenfunctions of the corresponding weighted Laplace-Beltrami operators. The convergence rates we obtain scale like the $L^2$-convergence rates established by two of the authors in previous work.
LGJun 19, 2020
Poisson Learning: Graph Based Semi-Supervised Learning At Very Low Label RatesJeff Calder, Brendan Cook, Matthew Thorpe et al.
We propose a new framework, called Poisson learning, for graph based semi-supervised learning at very low label rates. Poisson learning is motivated by the need to address the degeneracy of Laplacian semi-supervised learning in this regime. The method replaces the assignment of label values at training points with the placement of sources and sinks, and solves the resulting Poisson equation on the graph. The outcomes are provably more stable and informative than those of Laplacian learning. Poisson learning is efficient and simple to implement, and we present numerical experiments showing the method is superior to other recent approaches to semi-supervised learning at low label rates on MNIST, FashionMNIST, and Cifar-10. We also propose a graph-cut enhancement of Poisson learning, called Poisson MBO, that gives higher accuracy and can incorporate prior knowledge of relative class sizes.
STJun 4, 2020
Rates of Convergence for Laplacian Semi-Supervised Learning with Low Labeling RatesJeff Calder, Dejan Slepčev, Matthew Thorpe
We study graph-based Laplacian semi-supervised learning at low labeling rates. Laplacian learning uses harmonic extension on a graph to propagate labels. At very low label rates, Laplacian learning becomes degenerate and the solution is roughly constant with spikes at each labeled data point. Previous work has shown that this degeneracy occurs when the number of labeled data points is finite while the number of unlabeled data points tends to infinity. In this work we allow the number of labeled data points to grow to infinity with the number of labels. Our results show that for a random geometric graph with length scale $\varepsilon>0$ and labeling rate $β>0$, if $β\ll\varepsilon^2$ then the solution becomes degenerate and spikes form, and if $β\gg \varepsilon^2$ then Laplacian learning is well-posed and consistent with a continuum Laplace equation. Furthermore, in the well-posed setting we prove quantitative error estimates of $O(\varepsilonβ^{-1/2})$ for the difference between the solutions of the discrete problem and continuum PDE, up to logarithmic factors. We also study $p$-Laplacian regularization and show the same degeneracy result when $β\ll \varepsilon^p$. The proofs of our well-posedness results use the random walk interpretation of Laplacian learning and PDE arguments, while the proofs of the ill-posedness results use $Γ$-convergence tools from the calculus of variations. We also present numerical results on synthetic and real data to illustrate our results.
APJan 24, 2020
A continuum limit for the PageRank algorithmAmber Yuan, Jeff Calder, Braxton Osting
Semi-supervised and unsupervised machine learning methods often rely on graphs to model data, prompting research on how theoretical properties of operators on graphs are leveraged in learning problems. While most of the existing literature focuses on undirected graphs, directed graphs are very important in practice, giving models for physical, biological, or transportation networks, among many other applications. In this paper, we propose a new framework for rigorously studying continuum limits of learning algorithms on directed graphs. We use the new framework to study the PageRank algorithm, and show how it can be interpreted as a numerical scheme on a directed graph involving a type of normalized graph Laplacian. We show that the corresponding continuum limit problem, which is taken as the number of webpages grows to infinity, is a second-order, possibly degenerate, elliptic equation that contains reaction, diffusion, and advection terms. We prove that the numerical scheme is consistent and stable and compute explicit rates of convergence of the discrete solution to the solution of the continuum limit PDE. We give applications to proving stability and asymptotic regularity of the PageRank vector. Finally, we illustrate our results with numerical experiments and explore an application to data depth.
PROct 29, 2019
Improved spectral convergence rates for graph Laplacians on epsilon-graphs and k-NN graphsJeff Calder, Nicolas Garcia Trillos
In this paper we improve the spectral convergence rates for graph-based approximations of Laplace-Beltrami operators constructed from random data. We utilize regularity of the continuum eigenfunctions and strong pointwise consistency results to prove that spectral convergence rates are the same as the pointwise consistency rates for graph Laplacians. In particular, for an optimal choice of the graph connectivity $\varepsilon$, our results show that the eigenvalues and eigenvectors of the graph Laplacian converge to those of the Laplace-Beltrami operator at a rate of $O(n^{-1/(m+4)})$, up to log factors, where $m$ is the manifold dimension and $n$ is the number of vertices in the graph. Our approach is general and allows us to analyze a large variety of graph constructions that include $\varepsilon$-graphs and $k$-NN graphs.
NAMay 6, 2019
Computation of Circular Area and Spherical Volume Invariants via Boundary IntegralsRiley O'Neill, Pedro Angulo-Umana, Jeff Calder et al.
We show how to compute the circular area invariant of planar curves, and the spherical volume invariant of surfaces, in terms of line and surface integrals, respectively. We use the Divergence Theorem to express the area and volume integrals as line and surface integrals, respectively, against particular kernels; our results also extend to higher dimensional hypersurfaces. The resulting surface integrals are computable analytically on a triangulated mesh. This gives a simple computational algorithm for computing the spherical volume invariant for triangulated surfaces that does not involve discretizing the ambient space. We discuss potential applications to feature detection on broken bone fragments of interest in anthropology.
NAJan 15, 2019
Analysis and algorithms for $\ell_p$-based semi-supervised learning on graphsMauricio Flores, Jeff Calder, Gilad Lerman
This paper addresses theory and applications of $\ell_p$-based Laplacian regularization in semi-supervised learning. The graph $p$-Laplacian for $p>2$ has been proposed recently as a replacement for the standard ($p=2$) graph Laplacian in semi-supervised learning problems with very few labels, where Laplacian learning is degenerate. In the first part of the paper we prove new discrete to continuum convergence results for $p$-Laplace problems on $k$-nearest neighbor ($k$-NN) graphs, which are more commonly used in practice than random geometric graphs. Our analysis shows that, on $k$-NN graphs, the $p$-Laplacian retains information about the data distribution as $p\to \infty$ and Lipschitz learning ($p=\infty$) is sensitive to the data distribution. This situation can be contrasted with random geometric graphs, where the $p$-Laplacian forgets the data distribution as $p\to \infty$. We also present a general framework for proving discrete to continuum convergence results in graph-based learning that only requires pointwise consistency and monotonicity. In the second part of the paper, we develop fast algorithms for solving the variational and game-theoretic $p$-Laplace equations on weighted graphs for $p>2$. We present several efficient and scalable algorithms for both formulations, and present numerical results on synthetic data indicating their convergence properties. Finally, we conduct extensive numerical experiments on the MNIST, FashionMNIST and EMNIST datasets that illustrate the effectiveness of the $p$-Laplacian formulation for semi-supervised learning with few labels. In particular, we find that Lipschitz learning ($p=\infty$) performs well with very few labels on $k$-NN graphs, which experimentally validates our theoretical findings that Lipschitz learning retains information about the data distribution (the unlabeled data) on $k$-NN graphs.
APOct 10, 2018
Properly-weighted graph Laplacian for semi-supervised learningJeff Calder, Dejan Slepcev
The performance of traditional graph Laplacian methods for semi-supervised learning degrades substantially as the ratio of labeled to unlabeled data decreases, due to a degeneracy in the graph Laplacian. Several approaches have been proposed recently to address this, however we show that some of them remain ill-posed in the large-data limit. In this paper, we show a way to correctly set the weights in Laplacian regularization so that the estimator remains well posed and stable in the large-sample limit. We prove that our semi-supervised learning algorithm converges, in the infinite sample size limit, to the smooth solution of a continuum variational problem that attains the labeled values continuously. Our method is fast and easy to implement.
LGAug 28, 2018
Lipschitz regularized Deep Neural Networks generalize and are adversarially robustChris Finlay, Jeff Calder, Bilal Abbasi et al.
In this work we study input gradient regularization of deep neural networks, and demonstrate that such regularization leads to generalization proofs and improved adversarial robustness. The proof of generalization does not overcome the curse of dimensionality, but it is independent of the number of layers in the networks. The adversarial robustness regularization combines adversarial training, which we show to be equivalent to Total Variation regularization, with Lipschitz regularization. We demonstrate empirically that the regularized models are more robust, and that gradient norms of images can be used for attack detection.
NASep 30, 2018
Accelerated PDE's for efficient solution of regularized inversion problemsMinas Benyamin, Jeff Calder, Ganesh Sundaramoorthi et al.
We further develop a new framework, called PDE Acceleration, by applying it to calculus of variations problems defined for general functions on $\mathbb{R}^n$, obtaining efficient numerical algorithms to solve the resulting class of optimization problems based on simple discretizations of their corresponding accelerated PDE's. While the resulting family of PDE's and numerical schemes are quite general, we give special attention to their application for regularized inversion problems, with particular illustrative examples on some popular image processing applications. The method is a generalization of momentum, or accelerated, gradient descent to the PDE setting. For elliptic problems, the descent equations are a nonlinear damped wave equation, instead of a diffusion equation, and the acceleration is realized as an improvement in the CFL condition from $Δt\sim Δx^{2}$ (for diffusion) to $Δt\sim Δx$ (for wave equations). We work out several explicit as well as a semi-implicit numerical schemes, together with their necessary stability constraints, and include recursive update formulations which allow minimal-effort adaptation of existing gradient descent PDE codes into the accelerated PDE framework. We explore these schemes more carefully for a broad class of regularized inversion applications, with special attention to quadratic, Beltrami, and Total Variation regularization, where the accelerated PDE takes the form of a nonlinear wave equation. Experimental examples demonstrate the application of these schemes for image denoising, deblurring, and inpainting, including comparisons against Primal Dual, Split Bregman, and ADMM algorithms.
APNov 28, 2017
The game theoretic p-Laplacian and semi-supervised learning with few labelsJeff Calder
We study the game theoretic p-Laplacian for semi-supervised learning on graphs, and show that it is well-posed in the limit of finite labeled data and infinite unlabeled data. In particular, we show that the continuum limit of graph-based semi-supervised learning with the game theoretic p-Laplacian is a weighted version of the continuous p-Laplace equation. We also prove that solutions to the graph p-Laplace equation are approximately Holder continuous with high probability. Our proof uses the viscosity solution machinery and the maximum principle on a graph.
APOct 28, 2017
Consistency of Lipschitz learning with infinite unlabeled data and finite labeled dataJeff Calder
We study the consistency of Lipschitz learning on graphs in the limit of infinite unlabeled data and finite labeled data. Previous work has conjectured that Lipschitz learning is well-posed in this limit, but is insensitive to the distribution of the unlabeled data, which is undesirable for semi-supervised learning. We first prove that this conjecture is true in the special case of a random geometric graph model with kernel-based weights. Then we go on to show that on a random geometric graph with self-tuning weights, Lipschitz learning is in fact highly sensitive to the distribution of the unlabeled data, and we show how the degree of sensitivity can be adjusted by tuning the weights. In both cases, our results follow from showing that the sequence of learned functions converges to the viscosity solution of an $\infty$-Laplace type equation, and studying the structure of the limiting equation.
LGAug 15, 2016
Anomaly detection and classification for streaming data using PDEsBilal Abbasi, Jeff Calder, Adam M. Oberman
Nondominated sorting, also called Pareto Depth Analysis (PDA), is widely used in multi-objective optimization and has recently found important applications in multi-criteria anomaly detection. Recently, a partial differential equation (PDE) continuum limit was discovered for nondominated sorting leading to a very fast approximate sorting algorithm called PDE-based ranking. We propose in this paper a fast real-time streaming version of the PDA algorithm for anomaly detection that exploits the computational advantages of PDE continuum limits. Furthermore, we derive new PDE continuum limits for sorting points within their nondominated layers and show how the new PDEs can be used to classify anomalies based on which criterion was more significantly violated. We also prove statistical convergence rates for PDE-based ranking, and present the results of numerical experiments with both synthetic and real data.
CVAug 20, 2015
Multi-criteria Similarity-based Anomaly Detection using Pareto Depth AnalysisKo-Jen Hsiao, Kevin S. Xu, Jeff Calder et al.
We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. Similarity-based anomaly detection algorithms detect abnormally large amounts of similarity or dissimilarity, e.g.~as measured by nearest neighbor Euclidean distances between a test sample and the training samples. In many application domains there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such cases, multiple dissimilarity measures can be defined, including non-metric measures, and one can test for anomalies by scalarizing using a non-negative linear combination of them. If the relative importance of the different dissimilarity measures are not known in advance, as in many anomaly detection applications, the anomaly detection algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we propose a method for similarity-based anomaly detection using a novel multi-criteria dissimilarity measure, the Pareto depth. The proposed Pareto depth analysis (PDA) anomaly detection algorithm uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach is provably better than using linear combinations of the criteria and shows superior performance on experiments with synthetic and real data sets.
NAAug 6, 2015
Numerical schemes and rates of convergence for the Hamilton-Jacobi equation continuum limit of nondominated sortingJeff Calder
Nondominated sorting arranges a set of points in Euclidean space into layers by repeatedly removing the coordinatewise minimal elements. It was recently shown that nondominated sorting of random points has a Hamilton-Jacobi equation continuum limit. The obvious numerical scheme for this PDE has a slow convergence rate of O(h^1/n) for a grid of spacing h>0 in dimension n. In this paper, we introduce two new numerical schemes that have formal rates of O(h) and we prove the usual O(h^1/2) theoretical rates. We also present the results of numerical simulations illustrating the difference between the formal and theoretical rates.
APDec 13, 2014
Directed last passage percolation with discontinuous weightsJeff Calder
We prove that a directed last passage percolation model with discontinuous macroscopic (non-random) inhomogeneities has a continuum limit that corresponds to solving a Hamilton-Jacobi equation in the viscosity sense. This Hamilton-Jacobi equation is closely related to the conservation law for the hydrodynamic limit of the totally asymmetric simple exclusion process. We also prove convergence of a numerical scheme for the Hamilton-Jacobi equation and present an algorithm based on dynamic programming for finding the asymptotic shapes of maximal directed paths.
IRFeb 21, 2014
Pareto-depth for Multiple-query Image RetrievalKo-Jen Hsiao, Jeff Calder, Alfred O. Hero
Most content-based image retrieval systems consider either one single query, or multiple queries that include the same object or represent the same semantic information. In this paper we consider the content-based image retrieval problem for multiple query images corresponding to different image semantics. We propose a novel multiple-query information retrieval algorithm that combines the Pareto front method (PFM) with efficient manifold ranking (EMR). We show that our proposed algorithm outperforms state of the art multiple-query retrieval algorithms on real-world image databases. We attribute this performance improvement to concavity properties of the Pareto fronts, and prove a theoretical result that characterizes the asymptotic concavity of the fronts.