Melanie Weber

LG
h-index101
36papers
338citations
Novelty52%
AI Score56

36 Papers

61.0LGJun 2
Contrastive Neural Algorithmic Reasoning for Graph Coloring

Thien Le, Tianyu Zhao, Melanie Weber

Graph coloring seeks to assigns colors to a graph's nodes so that adjacent nodes receive different colors, using as few colors as possible. Here, we study approximate $k$-coloring, where the goal is to use at most $k$ colors while minimizing the number of monochromatic edges. This problem is central to graph theory and has applications in areas such as scheduling and resource allocation. Recent unsupervised GNN approaches optimize each instance directly, precluding generalization across graph sizes and distributions. We instead propose a contrastive learning framework that learns transferable coloring geometry where the embeddings of same-color nodes align, while adjacent nodes' representations are pushed toward distinct directions. We analyze the resulting population objective over bounded-size graphs. For unit-norm embeddings, we show that its optima have a line-prototype structure: Representations of nodes of the same color collapse to a shared one-dimensional subspace, and edges connect orthogonal subspaces. This geometry yields stationarity conditions in the supervised setting and is preserved by projected subgradient dynamics under a balanced-coloring assumption. In an unnormalized variant, gradient descent has a max-margin bias governed by a quotient-graph hard-margin problem. Experiments on synthetic and real-world graphs show that contrastive GNN encoders generalize effectively and produce low-conflict colorings, matching and sometimes improving on greedy approaches.

CLDec 16, 2022
LegalRelectra: Mixed-domain Language Modeling for Long-range Legal Text Comprehension

Wenyue Hua, Yuchen Zhang, Zhe Chen et al. · meta-ai, princeton

The application of Natural Language Processing (NLP) to specialized domains, such as the law, has recently received a surge of interest. As many legal services rely on processing and analyzing large collections of documents, automating such tasks with NLP tools emerges as a key challenge. Many popular language models, such as BERT or RoBERTa, are general-purpose models, which have limitations on processing specialized legal terminology and syntax. In addition, legal documents may contain specialized vocabulary from other domains, such as medical terminology in personal injury text. Here, we propose LegalRelectra, a legal-domain language model that is trained on mixed-domain legal and medical corpora. We show that our model improves over general-domain and single-domain medical and legal language models when processing mixed-domain (personal injury) text. Our training architecture implements the Electra framework, but utilizes Reformer instead of BERT for its generator and discriminator. We show that this improves the model's performance on processing long passages and results in better long-range text comprehension.

LGSep 17, 2023
Mitigating Over-Smoothing and Over-Squashing using Augmentations of Forman-Ricci Curvature

Lukas Fesser, Melanie Weber · princeton

While Graph Neural Networks (GNNs) have been successfully leveraged for learning on graph-structured data across domains, several potential pitfalls have been described recently. Those include the inability to accurately leverage information encoded in long-range connections (over-squashing), as well as difficulties distinguishing the learned representations of nearby nodes with growing network depth (over-smoothing). An effective way to characterize both effects is discrete curvature: Long-range connections that underlie over-squashing effects have low curvature, whereas edges that contribute to over-smoothing have high curvature. This observation has given rise to rewiring techniques, which add or remove edges to mitigate over-smoothing and over-squashing. Several rewiring approaches utilizing graph characteristics, such as curvature or the spectrum of the graph Laplacian, have been proposed. However, existing methods, especially those based on curvature, often require expensive subroutines and careful hyperparameter tuning, which limits their applicability to large-scale graphs. Here we propose a rewiring technique based on Augmented Forman-Ricci curvature (AFRC), a scalable curvature notation, which can be computed in linear time. We prove that AFRC effectively characterizes over-smoothing and over-squashing effects in message-passing GNNs. We complement our theoretical results with experiments, which demonstrate that the proposed approach achieves state-of-the-art performance while significantly reducing the computational cost in comparison with other methods. Utilizing fundamental properties of discrete curvature, we propose effective heuristics for hyperparameters in curvature-based rewiring, which avoids expensive hyperparameter searches, further improving the scalability of the proposed approach.

OCJun 22, 2022
On a class of geodesically convex optimization problems solved via Euclidean MM methods

Melanie Weber, Suvrit Sra

We study geodesically convex (g-convex) problems that can be written as a difference of Euclidean convex functions. This structure arises in several optimization problems in statistics and machine learning, e.g., for matrix scaling, M-estimators for covariances, and Brascamp-Lieb inequalities. Our work offers efficient algorithms that on the one hand exploit g-convexity to ensure global optimality along with guarantees on iteration complexity. On the other hand, the split structure permits us to develop Euclidean Majorization-Minorization algorithms that help us bypass the need to compute expensive Riemannian operations such as exponential maps and parallel transport. We illustrate our results by specializing them to a few concrete optimization problems that have been previously studied in the machine learning literature. Ultimately, we hope our work helps motivate the broader search for mixed Euclidean-Riemannian optimization algorithms

SIJul 19, 2023
Curvature-based Clustering on Graphs

Yu Tian, Zachary Lubberts, Melanie Weber · princeton

Unsupervised node clustering (or community detection) is a classical graph learning task. In this paper, we study algorithms, which exploit the geometry of the graph to identify densely connected substructures, which form clusters or communities. Our method implements discrete Ricci curvatures and their associated geometric flows, under which the edge weights of the graph evolve to reveal its community structure. We consider several discrete curvature notions and analyze the utility of the resulting algorithms. In contrast to prior literature, we study not only single-membership community detection, where each node belongs to exactly one community, but also mixed-membership community detection, where communities may overlap. For the latter, we argue that it is beneficial to perform community detection on the line graph, i.e., the graph's dual. We provide both theoretical and empirical evidence for the utility of our curvature-based clustering algorithms. In addition, we give several results on the relationship between the curvature of a graph and that of its dual, which enable the efficient implementation of our proposed mixed-membership community detection approach and which may be of independent interest for curvature-based network analysis.

DGJul 5, 2023
Continuum Limits of Ollivier's Ricci Curvature on data clouds: pointwise consistency and global lower bounds

Nicolas Garcia Trillos, Melanie Weber

Let $M$ denote a low-dimensional manifold embedded in Euclidean space and let ${X}= \{ x_1, \dots, x_n \}$ be a collection of points uniformly sampled from it. We study the relationship between the curvature of a random geometric graph built from ${X}$ and the curvature of the manifold $M$ via continuum limits of Ollivier's discrete Ricci curvature. We prove pointwise, non-asymptotic consistency results and also show that if $M$ has Ricci curvature bounded from below by a positive constant, then the random geometric graph will inherit this global structural property with high probability. We discuss applications of the global discrete curvature bounds to contraction properties of heat kernels on graphs, as well as implications for manifold learning from data clouds. In particular, we show that our consistency results allow for estimating the intrinsic curvature of a manifold by first estimating concrete extrinsic quantities.

LGNov 3, 2025
Priors in Time: Missing Inductive Biases for Language Model Interpretability

Ekdeep Singh Lubana, Can Rager, Sai Sumedh R. Hindupur et al.

Recovering meaningful concepts from language model activations is a central aim of interpretability. While existing feature extraction methods aim to identify concepts that are independent directions, it is unclear if this assumption can capture the rich temporal structure of language. Specifically, via a Bayesian lens, we demonstrate that Sparse Autoencoders (SAEs) impose priors that assume independence of concepts across time, implying stationarity. Meanwhile, language model representations exhibit rich temporal dynamics, including systematic growth in conceptual dimensionality, context-dependent correlations, and pronounced non-stationarity, in direct conflict with the priors of SAEs. Taking inspiration from computational neuroscience, we introduce a new interpretability objective -- Temporal Feature Analysis -- which possesses a temporal inductive bias to decompose representations at a given time into two parts: a predictable component, which can be inferred from the context, and a residual component, which captures novel information unexplained by the context. Temporal Feature Analyzers correctly parse garden path sentences, identify event boundaries, and more broadly delineate abstract, slow-moving information from novel, fast-moving information, while existing SAEs show significant pitfalls in all the above tasks. Overall, our results underscore the need for inductive biases that match the data in designing robust interpretability tools.

44.8LGMay 19
Towards Distillation Guarantees under Algorithmic Alignment for Combinatorial Optimization

Thien Le, Melanie Weber

Distillation transfers knowledge from a large model trained on broad data to a smaller, more efficient model suitable for deployment. In structured prediction settings, prior knowledge about the task can guide the choice of a target architecture that is algorithmically aligned with the underlying problem. Building on recent learning-theoretic analyses of decision-tree (DT) distillation (Boix-Adsera, 2024), we study when distillation succeeds for combinatorial optimization tasks. We focus on the case where the target model is a graph neural network whose architecture is aligned with a dynamic programming (DP) algorithm for the task. Assuming that the source model is sufficiently rich, formalized through the linear representation hypothesis (LRH) (Elhage et al., 2022; Park et al., 2024), we show that the distillation problem can be solved efficiently in the complexity parameters of the DP transition function, represented as a DT. Our results provide a rigorous sufficient condition for successful distillation in the flavour of algorithmic alignment.

LGNov 24, 2023
Effective Structural Encodings via Local Curvature Profiles

Lukas Fesser, Melanie Weber

Structural and Positional Encodings can significantly improve the performance of Graph Neural Networks in downstream tasks. Recent literature has begun to systematically investigate differences in the structural properties that these approaches encode, as well as performance trade-offs between them. However, the question of which structural properties yield the most effective encoding remains open. In this paper, we investigate this question from a geometric perspective. We propose a novel structural encoding based on discrete Ricci curvature (Local Curvature Profiles, short LCP) and show that it significantly outperforms existing encoding approaches. We further show that combining local structural encodings, such as LCP, with global positional encodings improves downstream performance, suggesting that they capture complementary geometric information. Finally, we compare different encoding types with (curvature-based) rewiring techniques. Rewiring has recently received a surge of interest due to its ability to improve the performance of Graph Neural Networks by mitigating over-smoothing and over-squashing effects. Our results suggest that utilizing curvature information for structural encodings delivers significantly larger performance increases than rewiring.

LGJul 5, 2024
Graph Pooling via Ricci Flow

Amy Feng, Melanie Weber

Graph Machine Learning often involves the clustering of nodes based on similarity structure encoded in the graph's topology and the nodes' attributes. On homophilous graphs, the integration of pooling layers has been shown to enhance the performance of Graph Neural Networks by accounting for inherent multi-scale structure. Here, similar nodes are grouped together to coarsen the graph and reduce the input size in subsequent layers in deeper architectures. In both settings, the underlying clustering approach can be implemented via graph pooling operators, which often rely on classical tools from Graph Theory. In this work, we introduce a graph pooling operator (ORC-Pool), which utilizes a characterization of the graph's geometry via Ollivier's discrete Ricci curvature and an associated geometric flow. Previous Ricci flow based clustering approaches have shown great promise across several domains, but are by construction unable to account for similarity structure encoded in the node attributes. However, in many ML applications, such information is vital for downstream tasks. ORC-Pool extends such clustering approaches to attributed graphs, allowing for the integration of geometric coarsening into Graph Neural Networks as a pooling layer.

CVFeb 25
Protein Graph Neural Networks for Heterogeneous Cryo-EM Reconstruction

Jonathan Krook, Axel Janson, Joakim andén et al.

We present a geometry-aware method for heterogeneous single-particle cryogenic electron microscopy (cryo-EM) reconstruction that predicts atomic backbone conformations. To incorporate protein-structure priors, we represent the backbone as a graph and use a graph neural network (GNN) autodecoder that maps per-image latent variables to 3D displacements of a template conformation. The objective combines a data-discrepancy term based on a differentiable cryo-EM forward model with geometric regularization, and it supports unknown orientations via ellipsoidal support lifting (ESL) pose estimation. On synthetic datasets derived from molecular dynamics trajectories, the proposed GNN achieves higher accuracy compared to a multilayer perceptron (MLP) of comparable size, highlighting the benefits of a geometry-informed inductive bias.

LGJan 3, 2024
On the hardness of learning under symmetries

Bobak T. Kiani, Thien Le, Hannah Lawrence et al.

We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected (i.e. non-symmetric) networks has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this work, we ask: are known problem symmetries sufficient to alleviate the fundamental hardness of learning neural nets with gradient descent? We answer this question in the negative. In particular, we give lower bounds for shallow graph neural networks, convolutional networks, invariant polynomials, and frame-averaged networks for permutation subgroups, which all scale either superpolynomially or exponentially in the relevant input dimension. Therefore, in spite of the significant inductive bias imparted via symmetry, actually learning the complete classes of functions represented by equivariant neural networks via gradient descent remains hard.

OCJul 7, 2024
Disciplined Geodesically Convex Programming

Andrew Cheng, Vaibhav Dixit, Melanie Weber

Convex programming plays a fundamental role in machine learning, data science, and engineering. Testing convexity structure in nonlinear programs relies on verifying the convexity of objectives and constraints. Grant et al. (2006) introduced a framework, Disciplined Convex Programming (DCP), for automating this verification task for a wide range of convex functions that can be decomposed into basic convex functions (atoms) using convexity-preserving compositions and transformations (rules). Here, we extend this framework to functions defined on manifolds with non-positive curvature (Hadamard manifolds) by introducing Disciplined Geodesically Convex Programming (DGCP). In particular, this allows for verifying a broader range of convexity notions. For instance, many notable instances of statistical estimators and matrix-valued (sub)routines in machine learning applications are Euclidean non-convex, but exhibit geodesic convexity through a more general Riemannian lens. To define the DGCP framework, we determine convexity-preserving compositions and transformations for geodesically convex functions on general Hadamard manifolds, as well as for the special case of symmetric positive definite matrices, a common setting in matrix-valued optimization. For the latter, we also define a basic set of atoms. Our paper is accompanied by a Julia package SymbolicAnalysis.jl, which provides functionality for testing and certifying DGCP-compliant expressions. Our library interfaces with manifold optimization software, which allows for directly solving verified geodesically convex programs.

CLMar 27, 2025
Shared Global and Local Geometry of Language Model Embeddings

Andrew Lee, Melanie Weber, Fernanda Viégas et al.

Researchers have recently suggested that models share common representations. In our work, we find numerous geometric similarities across the token embeddings of large language models. First, we find ``global'' similarities: token embeddings often share similar relative orientations. Next, we characterize local geometry in two ways: (1) by using Locally Linear Embeddings, and (2) by defining a simple measure for the intrinsic dimension of each embedding. Both characterizations allow us to find local similarities across token embeddings. Additionally, our intrinsic dimension demonstrates that embeddings lie on a lower dimensional manifold, and that tokens with lower intrinsic dimensions often have semantically coherent clusters, while those with higher intrinsic dimensions do not. Based on our findings, we introduce EMB2EMB, a simple application to linearly transform steering vectors from one language model to another, despite the two models having different dimensions.

57.2LGApr 8
Latent Structure of Affective Representations in Large Language Models

Benjamin J. Choi, Melanie Weber

The geometric structure of latent representations in large language models (LLMs) is an active area of research, driven in part by its implications for model transparency and AI safety. Existing literature has focused mainly on general geometric and topological properties of the learnt representations, but due to a lack of ground-truth latent geometry, validating the findings of such approaches is challenging. Emotion processing provides an intriguing testbed for probing representational geometry, as emotions exhibit both categorical organization and continuous affective dimensions, which are well-established in the psychology literature. Moreover, understanding such representations carries safety relevance. In this work, we investigate the latent structure of affective representations in LLMs using geometric data analysis tools. We present three main findings. First, we show that LLMs learn coherent latent representations of affective emotions that align with widely used valence--arousal models from psychology. Second, we find that these representations exhibit nonlinear geometric structure that can nonetheless be well-approximated linearly, providing empirical support for the linear representation hypothesis commonly assumed in model transparency methods. Third, we demonstrate that the learned latent representation space can be leveraged to quantify uncertainty in emotion processing tasks. Our findings suggest that LLMs acquire affective representations with geometric structure paralleling established models of human emotion, with practical implications for model interpretability and safety.

LGApr 11, 2025
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries

Neil He, Jiahong Liu, Buze Zhang et al.

In the era of foundation models and Large Language Models (LLMs), Euclidean space has been the de facto geometric setting for machine learning architectures. However, recent literature has demonstrated that this choice comes with fundamental limitations. At a large scale, real-world data often exhibit inherently non-Euclidean structures, such as multi-way relationships, hierarchies, symmetries, and non-isotropic scaling, in a variety of domains, such as languages, vision, and the natural sciences. It is challenging to effectively capture these structures within the constraints of Euclidean spaces. This position paper argues that moving beyond Euclidean geometry is not merely an optional enhancement but a necessity to maintain the scaling law for the next-generation of foundation models. By adopting these geometries, foundation models could more efficiently leverage the aforementioned structures. Task-aware adaptability that dynamically reconfigures embeddings to match the geometry of downstream applications could further enhance efficiency and expressivity. Our position is supported by a series of theoretical and empirical investigations of prevalent foundation models.Finally, we outline a roadmap for integrating non-Euclidean geometries into foundation models, including strategies for building geometric foundation models via fine-tuning, training from scratch, and hybrid approaches.

NCSep 12, 2025
On a Geometry of Interbrain Networks

Nicolás Hinrichs, Noah Guzmán, Melanie Weber

Effective analysis in neuroscience benefits significantly from robust conceptual frameworks. Traditional metrics of interbrain synchrony in social neuroscience typically depend on fixed, correlation-based approaches, restricting their explanatory capacity to descriptive observations. Inspired by the successful integration of geometric insights in network science, we propose leveraging discrete geometry to examine the dynamic reconfigurations in neural interactions during social exchanges. Unlike conventional synchrony approaches, our method interprets inter-brain connectivity changes through the evolving geometric structures of neural networks. This geometric framework is realized through a pipeline that identifies critical transitions in network connectivity using entropy metrics derived from curvature distributions. By doing so, we significantly enhance the capacity of hyperscanning methodologies to uncover underlying neural mechanisms in interactive social behavior.

LGMar 1, 2025
Performance Heterogeneity in Graph Neural Networks: Lessons for Architecture Design and Preprocessing

Lukas Fesser, Melanie Weber

Graph Neural Networks have emerged as the most popular architecture for graph-level learning, including graph classification and regression tasks, which frequently arise in areas such as biochemistry and drug discovery. Achieving good performance in practice requires careful model design. Due to gaps in our understanding of the relationship between model and data characteristics, this often requires manual architecture and hyperparameter tuning. This is particularly pronounced in graph-level tasks, due to much higher variation in the input data than in node-level tasks. To work towards closing these gaps, we begin with a systematic analysis of individual performance in graph-level tasks. Our results establish significant performance heterogeneity in both message-passing and transformer-based architectures. We then investigate the interplay of model and data characteristics as drivers of the observed heterogeneity. Our results suggest that graph topology alone cannot explain heterogeneity. Using the Tree Mover's Distance, which jointly evaluates topological and feature information, we establish a link between class-distance ratios and performance heterogeneity in graph classification. These insights motivate model and data preprocessing choices that account for heterogeneity between graphs. We propose a selective rewiring approach, which only targets graphs whose individual performance benefits from rewiring. We further show that the optimal network depth depends on the graph's spectrum, which motivates a heuristic for choosing the number of GNN layers. Our experiments demonstrate the utility of both design choices in practice.

OCOct 12, 2024
Structured Regularization for Constrained Optimization on the SPD Manifold

Andrew Cheng, Melanie Weber

Matrix-valued optimization tasks, including those involving symmetric positive definite (SPD) matrices, arise in a wide range of applications in machine learning, data science and statistics. Classically, such problems are solved via constrained Euclidean optimization, where the domain is viewed as a Euclidean space and the structure of the matrices (e.g., positive definiteness) enters as constraints. More recently, geometric approaches that leverage parametrizations of the problem as unconstrained tasks on the corresponding matrix manifold have been proposed. While they exhibit algorithmic benefits in many settings, they cannot directly handle additional constraints, such as inequality or sparsity constraints. A remedy comes in the form of constrained Riemannian optimization methods, notably, Riemannian Frank-Wolfe and Projected Gradient Descent. However, both algorithms require potentially expensive subroutines that can introduce computational bottlenecks in practise. To mitigate these shortcomings, we introduce a class of structured regularizers, based on symmetric gauge functions, which allow for solving constrained optimization on the SPD manifold with faster unconstrained methods. We show that our structured regularizers can be chosen to preserve or induce desirable structure, in particular convexity and "difference of convex" structure. We demonstrate the effectiveness of our approach in numerical experiments.

COJan 8
Neural Algorithmic Reasoning for Approximate $k$-Coloring with Recursive Warm Starts

Knut Vanderbush, Melanie Weber

Node coloring is the task of assigning colors to the nodes of a graph such that no two adjacent nodes have the same color, while using as few colors as possible. It is the most widely studied instance of graph coloring and of central importance in graph theory; major results include the Four Color Theorem and work on the Hadwiger-Nelson Problem. As an abstraction of classical combinatorial optimization tasks, such as scheduling and resource allocation, it is also rich in practical applications. Here, we focus on a relaxed version, approximate $k$-coloring, which is the task of assigning at most $k$ colors to the nodes of a graph such that the number of edges whose vertices have the same color is approximately minimized. While classical approaches leverage mathematical programming or SAT solvers, recent studies have explored the use of machine learning. We follow this route and explore the use of graph neural networks (GNNs) for node coloring. We first present an optimized differentiable algorithm that improves a prior approach by Schuetz et al. with orthogonal node feature initialization and a loss function that penalizes conflicting edges more heavily when their endpoints have higher degree; the latter inspired by the classical result that a graph is $k$-colorable if and only if its $k$-core is $k$-colorable. Next, we introduce a lightweight greedy local search algorithm and show that it may be improved by recursively computing a $(k-1)$-coloring to use as a warm start. We then show that applying such recursive warm starts to the GNN approach leads to further improvements. Numerical experiments on a range of different graph structures show that while the local search algorithms perform best on small inputs, the GNN exhibits superior performance at scale. The recursive warm start may be of independent interest beyond graph coloring for local search methods for combinatorial optimization.

LGDec 5, 2025
Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry

Behrooz Tahmasebi, Melanie Weber

Enforcing exact symmetry in machine learning models often yields significant gains in scientific applications, serving as a powerful inductive bias. However, recent work suggests that relying on approximate symmetry can offer greater flexibility and robustness. Despite promising empirical evidence, there has been little theoretical understanding, and in particular, a direct comparison between exact and approximate symmetry is missing from the literature. In this paper, we initiate this study by asking: What is the cost of enforcing exact versus approximate symmetry? To address this question, we introduce averaging complexity, a framework for quantifying the cost of enforcing symmetry via averaging. Our main result is an exponential separation: under standard conditions, achieving exact symmetry requires linear averaging complexity, whereas approximate symmetry can be attained with only logarithmic averaging complexity. To the best of our knowledge, this provides the first theoretical separation of these two cases, formally justifying why approximate symmetry may be preferable in practice. Beyond this, our tools and techniques may be of independent interest for the broader study of symmetries in machine learning.

OCOct 23, 2025
Iso-Riemannian Optimization on Learned Data Manifolds

Willem Diepeveen, Melanie Weber

High-dimensional data that exhibit an intrinsic low-dimensional structure are ubiquitous in machine learning and data science. While various approaches allow for learning the corresponding data manifold from finite samples, performing downstream tasks such as optimization directly on these learned manifolds presents a significant challenge. This work introduces a principled framework for optimization on learned data manifolds using iso-Riemannian geometry. Our approach addresses key limitations of classical Riemannian optimization in this setting, specifically, that the Levi-Civita connection fails to yield constant-speed geodesics, and that geodesic convexity assumptions break down under the learned pullback constructions commonly used in practice. To overcome these challenges, we propose new notions of monotonicity and Lipschitz continuity tailored to the iso-Riemannian setting and propose iso-Riemannian descent algorithms for which we provide a detailed convergence analysis. We demonstrate the practical effectiveness of those algorithms on both synthetic and real datasets, including MNIST under a learned pullback structure. Our approach yields interpretable barycentres, improved clustering, and provably efficient solutions to inverse problems, even in high-dimensional settings. These results establish that optimization under iso-Riemannian geometry can overcome distortions inherent to learned manifold mappings.

LGSep 26, 2025
Neural Feature Geometry Evolves as Discrete Ricci Flow

Moritz Hehl, Max von Renesse, Melanie Weber

Deep neural networks learn feature representations via complex geometric transformations of the input data manifold. Despite the models' empirical success across domains, our understanding of neural feature representations is still incomplete. In this work we investigate neural feature geometry through the lens of discrete geometry. Since the input data manifold is typically unobserved, we approximate it using geometric graphs that encode local similarity structure. We provide theoretical results on the evolution of these graphs during training, showing that nonlinear activations play a crucial role in shaping feature geometry in feedforward neural networks. Moreover, we discover that the geometric transformations resemble a discrete Ricci flow on these graphs, suggesting that neural feature geometry evolves analogous to Ricci flow. This connection is supported by experiments on over 20,000 feedforward neural networks trained on binary classification tasks across both synthetic and real-world datasets. We observe that the emergence of class separability corresponds to the emergence of community structure in the associated graph representations, which is known to relate to discrete Ricci flow dynamics. Building on these insights, we introduce a novel framework for locally evaluating geometric transformations through comparison with discrete Ricci flow dynamics. Our results suggest practical design principles, including a geometry-informed early-stopping heuristic and a criterion for selecting network depth.

LGSep 25, 2025
Bispectral OT: Dataset Comparison using Symmetry-Aware Optimal Transport

Annabel Ma, Kaiying Hou, David Alvarez-Melis et al. · harvard, microsoft-research

Optimal transport (OT) is a widely used technique in machine learning, graphics, and vision that aligns two distributions or datasets using their relative geometry. In symmetry-rich settings, however, OT alignments based solely on pairwise geometric distances between raw features can ignore the intrinsic coherence structure of the data. We introduce Bispectral Optimal Transport, a symmetry-aware extension of discrete OT that compares elements using their representation using the bispectrum, a group Fourier invariant that preserves all signal structure while removing only the variation due to group actions. Empirically, we demonstrate that the transport plans computed with Bispectral OT achieve greater class preservation accuracy than naive feature OT on benchmark datasets transformed with visual symmetries, improving the quality of meaningful correspondences that capture the underlying semantic label structure in the dataset while removing nuisance variation not affecting class or content.

AISep 2, 2025
The Future of Artificial Intelligence and the Mathematical and Physical Sciences (AI+MPS)

Andrew Ferguson, Marisa LaFleur, Lars Ruthotto et al. · stanford

This community paper developed out of the NSF Workshop on the Future of Artificial Intelligence (AI) and the Mathematical and Physics Sciences (MPS), which was held in March 2025 with the goal of understanding how the MPS domains (Astronomy, Chemistry, Materials Research, Mathematical Sciences, and Physics) can best capitalize on, and contribute to, the future of AI. We present here a summary and snapshot of the MPS community's perspective, as of Spring/Summer 2025, in a rapidly developing field. The link between AI and MPS is becoming increasingly inextricable; now is a crucial moment to strengthen the link between AI and Science by pursuing a strategy that proactively and thoughtfully leverages the potential of AI for scientific discovery and optimizes opportunities to impact the development of AI by applying concepts from fundamental science. To achieve this, we propose activities and strategic priorities that: (1) enable AI+MPS research in both directions; (2) build up an interdisciplinary community of AI+MPS researchers; and (3) foster education and workforce development in AI for MPS researchers and students. We conclude with a summary of suggested priorities for funding agencies, educational institutions, and individual researchers to help position the MPS community to be a leader in, and take full advantage of, the transformative potential of AI+MPS.

LGJun 2, 2025
Automated Manifold Learning for Reduced Order Modeling

Imran Nasim, Melanie Weber

The problem of identifying geometric structure in data is a cornerstone of (unsupervised) learning. As a result, Geometric Representation Learning has been widely applied across scientific and engineering domains. In this work, we investigate the use of Geometric Representation Learning for the data-driven discovery of system dynamics from spatial-temporal data. We propose to encode similarity structure in such data in a spatial-temporal proximity graph, to which we apply a range of classical and deep learning-based manifold learning approaches to learn reduced order dynamics. We observe that while manifold learning is generally capable of recovering reduced order dynamics, the quality of the learned representations varies substantially across different algorithms and hyperparameter choices. This is indicative of high sensitivity to the inherent geometric assumptions of the respective approaches and suggests a need for careful hyperparameter tuning, which can be expensive in practise. To overcome these challenges, we propose a framework for Automated Manifold Learning, which selects a manifold learning approach and corresponding hyperparameter choices based on representative subsamples of the input graph. We demonstrate that the proposed framework leads to performance gains both in scalability and in the learned representations' accuracy in capturing local and global geometric features of the underlying system dynamics.

CGMay 20, 2025
Towards Non-Euclidean Foundation Models: Advancing AI Beyond Euclidean Frameworks

Menglin Yang, Yifei Zhang, Jialin Chen et al.

In the era of foundation models and Large Language Models (LLMs), Euclidean space is the de facto geometric setting of our machine learning architectures. However, recent literature has demonstrated that this choice comes with fundamental limitations. To that end, non-Euclidean learning is quickly gaining traction, particularly in web-related applications where complex relationships and structures are prevalent. Non-Euclidean spaces, such as hyperbolic, spherical, and mixed-curvature spaces, have been shown to provide more efficient and effective representations for data with intrinsic geometric properties, including web-related data like social network topology, query-document relationships, and user-item interactions. Integrating foundation models with non-Euclidean geometries has great potential to enhance their ability to capture and model the underlying structures, leading to better performance in search, recommendations, and content understanding. This workshop focuses on the intersection of Non-Euclidean Foundation Models and Geometric Learning (NEGEL), exploring its potential benefits, including the potential benefits for advancing web-related technologies, challenges, and future directions. Workshop page: [https://hyperboliclearning.github.io/events/www2025workshop](https://hyperboliclearning.github.io/events/www2025workshop)

LGFeb 13, 2025
Enhancing the Utility of Higher-Order Information in Relational Learning

Raphael Pellegrin, Lukas Fesser, Melanie Weber

Higher-order information is crucial for relational learning in many domains where relationships extend beyond pairwise interactions. Hypergraphs provide a natural framework for modeling such relationships, which has motivated recent extensions of graph neural network architectures to hypergraphs. However, comparisons between hypergraph architectures and standard graph-level models remain limited. In this work, we systematically evaluate a selection of hypergraph-level and graph-level architectures, to determine their effectiveness in leveraging higher-order information in relational learning. Our results show that graph-level architectures applied to hypergraph expansions often outperform hypergraph-level ones, even on inputs that are naturally parametrized as hypergraphs. As an alternative approach for leveraging higher-order information, we propose hypergraph-level encodings based on classical hypergraph characteristics. While these encodings do not significantly improve hypergraph architectures, they yield substantial performance gains when combined with graph-level models. Our theoretical analysis shows that hypergraph-level encodings provably increase the representational power of message-passing graph neural networks beyond that of their graph-level counterparts.

LGJun 3, 2024
Hardness of Learning Neural Networks under the Manifold Hypothesis

Bobak T. Kiani, Jason Wang, Melanie Weber

The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under i.i.d. Gaussian or uniform Boolean data distributions. In this paper, we investigate the hardness of learning under the manifold hypothesis. We ask which minimal assumptions on the curvature and regularity of the manifold, if any, render the learning problem efficiently learnable. We prove that learning is hard under input manifolds of bounded curvature by extending proofs of hardness in the SQ and cryptographic settings for Boolean data inputs to the geometric setting. On the other hand, we show that additional assumptions on the volume of the data manifold alleviate these fundamental limitations and guarantee learnability via a simple interpolation argument. Notable instances of this regime are manifolds which can be reliably reconstructed via manifold learning. Looking forward, we comment on and empirically explore intermediate regimes of manifolds, which have heterogeneous features commonly found in real world data.

CYSep 21, 2021
Identifying biases in legal data: An algorithmic fairness perspective

Jackson Sargent, Melanie Weber

The need to address representation biases and sentencing disparities in legal case data has long been recognized. Here, we study the problem of identifying and measuring biases in large-scale legal case data from an algorithmic fairness perspective. Our approach utilizes two regression models: A baseline that represents the decisions of a "typical" judge as given by the data and a "fair" judge that applies one of three fairness concepts. Comparing the decisions of the "typical" judge and the "fair" judge allows for quantifying biases across demographic groups, as we demonstrate in four case studies on criminal data from Cook County (Illinois).

LGApr 11, 2020
Robust Large-Margin Learning in Hyperbolic Space

Melanie Weber, Manzil Zaheer, Ankit Singh Rawat et al.

Recently, there has been a surge of interest in representation learning in hyperbolic spaces, driven by their ability to represent hierarchical data with significantly fewer dimensions than standard Euclidean spaces. However, the viability and benefits of hyperbolic spaces for downstream machine learning tasks have received less attention. In this paper, we present, to our knowledge, the first theoretical guarantees for learning a classifier in hyperbolic rather than Euclidean space. Specifically, we consider the problem of learning a large-margin classifier for data possessing a hierarchical structure. We provide an algorithm to efficiently learn a large-margin hyperplane, relying on the careful injection of adversarial examples. Finally, we prove that for hierarchical data that embeds well into hyperbolic space, the low embedding dimension ensures superior guarantees when learning the classifier directly in hyperbolic space.

LGOct 12, 2019
Neighborhood Growth Determines Geometric Priors for Relational Representation Learning

Melanie Weber

The problem of identifying geometric structure in heterogeneous, high-dimensional data is a cornerstone of representation learning. While there exists a large body of literature on the embeddability of canonical graphs, such as lattices or trees, the heterogeneity of the relational data typically encountered in practice limits the applicability of these classical methods. In this paper, we propose a combinatorial approach to evaluating embeddability, i.e., to decide whether a data set is best represented in Euclidean, Hyperbolic or Spherical space. Our method analyzes nearest-neighbor structures and local neighborhood growth rates to identify the geometric priors of suitable embedding spaces. For canonical graphs, the algorithm's prediction provably matches classical results. As for large, heterogeneous graphs, we introduce an efficiently computable statistic that approximates the algorithm's decision rule. We validate our method over a range of benchmark data sets and compare with recently published optimization-based embeddability methods.

OCOct 9, 2019
Projection-free nonconvex stochastic optimization on Riemannian manifolds

Melanie Weber, Suvrit Sra

We study stochastic projection-free methods for constrained optimization of smooth functions on Riemannian manifolds, i.e., with additional constraints beyond the parameter domain being a manifold. Specifically, we introduce stochastic Riemannian Frank-Wolfe methods for nonconvex and geodesically convex problems. We present algorithms for both purely stochastic optimization and finite-sum problems. For the latter, we develop variance-reduced methods, including a Riemannian adaptation of the recently proposed Spider technique. For all settings, we recover convergence rates that are comparable to the best-known rates for their Euclidean counterparts. Finally, we discuss applications to two classic tasks: The computation of the Karcher mean of positive definite matrices and Wasserstein barycenters for multivariate normal distributions. For both tasks, stochastic Fw methods yield state-of-the-art empirical performance.

LGJan 17, 2019
The Oracle of DLphi

Dominik Alfke, Weston Baines, Jan Blechschmidt et al.

We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting that as long as one has access to enough data points, the quality of the data is irrelevant.

MLJul 1, 2018
Heuristic Framework for Multi-Scale Testing of the Multi-Manifold Hypothesis

F. Patricia Medina, Linda Ness, Melanie Weber et al.

When analyzing empirical data, we often find that global linear models overestimate the number of parameters required. In such cases, we may ask whether the data lies on or near a manifold or a set of manifolds (a so-called multi-manifold) of lower dimension than the ambient space. This question can be phrased as a (multi-) manifold hypothesis. The identification of such intrinsic multiscale features is a cornerstone of data analysis and representation and has given rise to a large body of work on manifold learning. In this work, we review key results on multi-scale data analysis and intrinsic dimension followed by the introduction of a heuristic, multiscale framework for testing the multi-manifold hypothesis. Our method implements a hypothesis test on a set of spline-interpolated manifolds constructed from variance-based intrinsic dimensions. The workflow is suitable for empirical data analysis as we demonstrate on two use cases.

OCOct 30, 2017
Riemannian Optimization via Frank-Wolfe Methods

Melanie Weber, Suvrit Sra

We study projection-free methods for constrained Riemannian optimization. In particular, we propose the Riemannian Frank-Wolfe (RFW) method. We analyze non-asymptotic convergence rates of RFW to an optimum for (geodesically) convex problems, and to a critical point for nonconvex objectives. We also present a practical setting under which RFW can attain a linear convergence rate. As a concrete example, we specialize RFW to the manifold of positive definite matrices and apply it to two tasks: (i) computing the matrix geometric mean (Riemannian centroid); and (ii) computing the Bures-Wasserstein barycenter. Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian "linear" oracle required by RFW admits a closed-form solution; this result may be of independent interest. We further specialize RFW to the special orthogonal group and show that here too, the Riemannian "linear" oracle can be solved in closed form. Here, we describe an application to the synchronization of data matrices (Procrustes problem). We complement our theoretical results with an empirical comparison of RFW against state-of-the-art Riemannian optimization methods and observe that RFW performs competitively on the task of computing Riemannian centroids.