LGOct 12, 2022
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in TransformersZonglin Li, Chong You, Srinadh Bhojanapalli et al. · deepmind
This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse. By activation map we refer to the intermediate output of the multi-layer perceptrons (MLPs) after a ReLU activation function, and by sparse we mean that on average very few entries (e.g., 3.0% for T5-Base and 6.3% for ViT-B16) are nonzero for each input to MLP. Moreover, larger Transformers with more layers and wider MLP hidden dimensions are sparser as measured by the percentage of nonzero entries. Through extensive experiments we demonstrate that the emergence of sparsity is a prevalent phenomenon that occurs for both natural language processing and vision tasks, on both training and evaluation data, for Transformers of various configurations, at layers of all depth levels, as well as for other architectures including MLP-mixers and 2-layer MLPs. We show that sparsity also emerges using training datasets with random labels, or with random inputs, or with infinite amount of data, demonstrating that sparsity is not a result of a specific family of datasets. We discuss how sparsity immediately implies a way to significantly reduce the FLOP count and improve efficiency for Transformers. Moreover, we demonstrate perhaps surprisingly that enforcing an even sparser activation via Top-k thresholding with a small value of k brings a collection of desired but missing properties for Transformers, namely less sensitivity to noisy training data, more robustness to input corruptions, and better calibration for their prediction confidence.
CVAug 26, 2024Code
I2EBench: A Comprehensive Benchmark for Instruction-based Image EditingYiwei Ma, Jiayi Ji, Ke Ye et al.
Significant progress has been made in the field of Instruction-based Image Editing (IIE). However, evaluating these models poses a significant challenge. A crucial requirement in this field is the establishment of a comprehensive evaluation benchmark for accurately assessing editing results and providing valuable insights for its further development. In response to this need, we propose I2EBench, a comprehensive benchmark designed to automatically evaluate the quality of edited images produced by IIE models from multiple dimensions. I2EBench consists of 2,000+ images for editing, along with 4,000+ corresponding original and diverse instructions. It offers three distinctive characteristics: 1) Comprehensive Evaluation Dimensions: I2EBench comprises 16 evaluation dimensions that cover both high-level and low-level aspects, providing a comprehensive assessment of each IIE model. 2) Human Perception Alignment: To ensure the alignment of our benchmark with human perception, we conducted an extensive user study for each evaluation dimension. 3) Valuable Research Insights: By analyzing the advantages and disadvantages of existing IIE models across the 16 dimensions, we offer valuable research insights to guide future development in the field. We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models. The code, dataset and generated images from all IIE models are provided in github: https://github.com/cocoshe/I2EBench.
NAJun 16, 2016
Schubert varieties and distances between subspaces of different dimensionsKe Ye, Lek-Heng Lim
We resolve a basic problem on subspace distances that often arises in applications: How can the usual Grassmann distance between equidimensional subspaces be extended to subspaces of different dimensions? We show that a natural solution is given by the distance of a point to a Schubert variety within the Grassmannian. This distance reduces to the Grassmann distance when the subspaces are equidimensional and does not depend on any embedding into a larger ambient space. Furthermore, it has a concrete expression involving principal angles, and is efficiently computable in numerically stable ways. Our results are largely independent of the Grassmann distance --- if desired, it may be substituted by any other common distances between subspaces. Our approach depends on a concrete algebraic geometric view of the Grassmannian that parallels the differential geometric perspective that is well-established in applied and computational mathematics.
NAJun 4, 2018
Geometric distance between positive definite matrices of different dimensionsLek-Heng Lim, Rodolphe Sepulchre, Ke Ye
We show how the Riemannian distance on $\mathbb{S}^n_{++}$, the cone of $n\times n$ real symmetric or complex Hermitian positive definite matrices, may be used to naturally define a distance between two such matrices of different dimensions. Given that $\mathbb{S}^n_{++}$ also parameterizes $n$-dimensional ellipsoids, and inner products on $\mathbb{R}^n$, $n \times n$ covariance matrices of nondegenerate probability distributions, this gives us a natural way to define a geometric distance between a pair of such objects of different dimensions.
NAFeb 9, 2019
Tensor network ranksKe Ye, Lek-Heng Lim
In problems involving approximation, completion, denoising, dimension reduction, estimation, interpolation, modeling, order reduction, regression, etc, we argue that the near-universal practice of assuming that a function, matrix, or tensor (which we will see are all the same object in this context) has \emph{low rank} may be ill-justified. There are many natural instances where the object in question has high rank with respect to the classical notions of rank: matrix rank, tensor rank, multilinear rank --- the latter two being the most straightforward generalizations of the former. To remedy this, we show that one may vastly expand these classical notions of ranks: Given any undirected graph $G$, there is a notion of $G$-rank associated with $G$, which provides us with as many different kinds of ranks as there are undirected graphs. In particular, the popular tensor network states in physics (e.g., \textsc{mps}, \textsc{ttns}, \textsc{peps}) may be regarded as functions of a specific $G$-rank for various choices of $G$. Among other things, we will see that a function, matrix, or tensor may have very high matrix, tensor, or multilinear rank and yet very low $G$-rank for some $G$. In fact the difference is in the orders of magnitudes and the gaps between $G$-ranks and these classical ranks are arbitrarily large for some important objects in computer science, mathematics, and physics. Furthermore, we show that there is a $G$ such that almost every tensor has $G$-rank exponentially lower than its rank or the dimension of its ambient space.
AGApr 22, 2018
Topology of tensor ranksPierre Comon, Lek-Heng Lim, Yang Qi et al.
We study path-connectedness and homotopy groups of sets of tensors defined by tensor rank, border rank, multilinear rank, as well as their symmetric counterparts for symmetric tensors. We show that over $\mathbb{C}$, the set of rank-$r$ tensors and the set of symmetric rank-$r$ symmetric tensors are both path-connected if $r$ is not more than the complex generic rank; these results also extend to border rank and symmetric border rank over $\mathbb{C}$. Over $\mathbb{R}$, the set of rank-$r$ tensors is path-connected if it has the expected dimension but the corresponding result for symmetric rank-$r$ symmetric $d$-tensors depends on the order $d$: connected when $d$ is odd but not when $d$ is even. Border rank and symmetric border rank over $\mathbb{R}$ have essentially the same path-connectedness properties as rank and symmetric rank over $\mathbb{R}$. When $r$ is greater than the complex generic rank, we are unable to discern any general pattern: For example, we show that border-rank-three tensors in $\mathbb{R}^2 \otimes \mathbb{R}^2 \otimes \mathbb{R}^2$ fall into four connected components. For multilinear rank, the manifold of $d$-tensors of multilinear rank $(r_1,\dots,r_d)$ in $\mathbb{C}^{n_1} \otimes \cdots \otimes \mathbb{C}^{n_d}$ is always path-connected, and the same is true in $\mathbb{R}^{n_1} \otimes \cdots \otimes \mathbb{R}^{n_d}$ unless $n_i = r_i = \prod_{j \ne i} r_j$ for some $i\in\{1, \dots, d\}$. Beyond path-connectedness, we determine, over both $\mathbb{R}$ and $\mathbb{C}$, the fundamental and higher homotopy groups of the set of tensors of a fixed small rank, and, taking advantage of Bott periodicity, those of the manifold of tensors of a fixed multilinear rank. We also obtain analogues of these results for symmetric tensors of a fixed symmetric rank or a fixed symmetric multilinear rank.
NAJun 9, 2016
Fast structured matrix computations: tensor rank and Cohn--Umans methodKe Ye, Lek-Heng Lim
We discuss a generalization of the Cohn-Umans method, a potent technique developed for studying the bilinear complexity of matrix multiplication by embedding matrices into an appropriate group algebra. We investigate how the Cohn-Umans method may be used for bilinear operations other than matrix multiplication, with algebras other than group algebras, and we relate it to Strassen's tensor rank approach, the traditional framework for investigating bilinear complexity. To demonstrate the utility of the generalized method, we apply it to find the fastest algorithms for forming structured matrix-vector product, the basic operation underlying iterative algorithms for structured matrices. The structures we study include Toeplitz, Hankel, circulant, symmetric, skew-symmetric, f-circulant, block-Toeplitz-Toeplitz-block, triangular Toeplitz matrices, Toeplitz-plus-Hankel, sparse/banded/triangular. Except for the case of skew-symmetric matrices, for which we have only upper bounds, the algorithms derived using the generalized Cohn-Umans method in all other instances are the fastest possible in the sense of having minimum bilinear complexity. We also apply this framework to a few other bilinear operations including matrix-matrix, commutator, simultaneous matrix products, and briefly discuss the relation between tensor nuclear norm and numerical stability.
OCDec 18, 2018
Semi-Riemannian Manifold OptimizationTingran Gao, Lek-Heng Lim, Ke Ye
We introduce in this paper a manifold optimization framework that utilizes semi-Riemannian structures on the underlying smooth manifolds. Unlike in Riemannian geometry, where each tangent space is equipped with a positive definite inner product, a semi-Riemannian manifold allows the metric tensor to be indefinite on each tangent space, i.e., possessing both positive and negative definite subspaces; differential geometric objects such as geodesics and parallel-transport can be defined on non-degenerate semi-Riemannian manifolds as well, and can be carefully leveraged to adapt Riemannian optimization algorithms to the semi-Riemannian setting. In particular, we discuss the metric independence of manifold optimization algorithms, and illustrate that the weaker but more general semi-Riemannian geometry often suffices for the purpose of optimizing smooth functions on smooth manifolds in practice.
NAMar 22, 2016
Algorithms for structured matrix-vector product of optimal bilinear complexityKe Ye, Lek-Heng Lim
We present explicit algorithms for computing structured matrix-vector products that are optimal in the sense of Strassen, i.e., using a provably minimum number of multiplications. These structures include Toeplitz/Hankel/circulant, symmetric, Toeplitz-plus-Hankel, sparse, and multilevel structures. The last category include \textsc{bttb}, \textsc{bhhb}, \textsc{bccb} but also any arbitrarily complicated nested structures built out of other structures.
CVNov 8, 2023
FFINet: Future Feedback Interaction Network for Motion ForecastingMiao Kang, Shengqi Wang, Sanping Zhou et al.
Motion forecasting plays a crucial role in autonomous driving, with the aim of predicting the future reasonable motions of traffic agents. Most existing methods mainly model the historical interactions between agents and the environment, and predict multi-modal trajectories in a feedforward process, ignoring potential trajectory changes caused by future interactions between agents. In this paper, we propose a novel Future Feedback Interaction Network (FFINet) to aggregate features the current observations and potential future interactions for trajectory prediction. Firstly, we employ different spatial-temporal encoders to embed the decomposed position vectors and the current position of each scene, providing rich features for the subsequent cross-temporal aggregation. Secondly, the relative interaction and cross-temporal aggregation strategies are sequentially adopted to integrate features in the current fusion module, observation interaction module, future feedback module and global fusion module, in which the future feedback module can enable the understanding of pre-action by feeding the influence of preview information to feedforward prediction. Thirdly, the comprehensive interaction features are further fed into final predictor to generate the joint predicted trajectories of multiple agents. Extensive experimental results show that our FFINet achieves the state-of-the-art performance on Argoverse 1 and Argoverse 2 motion forecasting benchmarks.
AIJul 29, 2024
Apple Intelligence Foundation Language ModelsTom Gunter, Zirui Wang, Chong Wang et al.
We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied throughout the model development.
AIFeb 4
ReThinker: Scientific Reasoning by Rethinking with Guided Reflection and Confidence ControlZhentao Tang, Yuqi Cui, Shixiong Kai et al.
Expert-level scientific reasoning remains challenging for large language models, particularly on benchmarks such as Humanity's Last Exam (HLE), where rigid tool pipelines, brittle multi-agent coordination, and inefficient test-time scaling often limit performance. We introduce ReThinker, a confidence-aware agentic framework that orchestrates retrieval, tool use, and multi-agent reasoning through a stage-wise Solver-Critic-Selector architecture. Rather than following a fixed pipeline, ReThinker dynamically allocates computation based on model confidence, enabling adaptive tool invocation, guided multi-dimensional reflection, and robust confidence-weighted selection. To support scalable training without human annotation, we further propose a reverse data synthesis pipeline and an adaptive trajectory recycling strategy that transform successful reasoning traces into high-quality supervision. Experiments on HLE, GAIA, and XBench demonstrate that ReThinker consistently outperforms state-of-the-art foundation models with tools and existing deep research systems, achieving state-of-the-art results on expert-level reasoning tasks.
CLJul 22, 2025Code
Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?Arduin Findeis, Floris Weers, Guoli Yin et al.
Pairwise preferences over model responses are widely collected to evaluate and provide feedback to large language models (LLMs). Given two alternative model responses to the same input, a human or AI annotator selects the "better" response. This approach can provide feedback for domains where other hard-coded metrics are difficult to obtain (e.g., chat response quality), thereby helping model evaluation or training. However, for some domains high-quality pairwise comparisons can be tricky to obtain - from AI and humans. For example, for responses with many factual statements, annotators may disproportionately weigh writing quality rather than underlying facts. In this work, we explore augmenting standard AI annotator systems with additional tools to improve performance on three challenging response domains: long-form factual, math and code tasks. We propose a tool-using agentic system to provide higher quality feedback on these domains. Our system uses web-search and code execution to ground itself based on external validation, independent of the LLM's internal knowledge and biases. We provide extensive experimental results evaluating our method across the three targeted response domains as well as general annotation tasks, using RewardBench (incl. AlpacaEval and LLMBar), RewardMath, as well as three new datasets for domains with saturated pre-existing datasets. Our results indicate that external tools can indeed improve performance in many, but not all, cases. More generally, our experiments highlight the sensitivity of performance to simple parameters (e.g., prompt) and the need for improved (non-saturated) annotator benchmarks. We share our code at https://github.com/apple/ml-agent-evaluator.
ROMay 21, 2025
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task GeneralizationJiaming Zhou, Ke Ye, Jiayi Liu et al.
The generalization capabilities of vision-language-action (VLA) models to unseen tasks are crucial to achieving general-purpose robotic manipulation in open-world settings. However, the cross-task generalization capabilities of existing VLA models remain significantly underexplored. To address this gap, we introduce AGNOSTOS, a novel simulation benchmark designed to rigorously evaluate cross-task zero-shot generalization in manipulation. AGNOSTOS comprises 23 unseen manipulation tasks for testing, distinct from common training task distributions, and incorporates two levels of generalization difficulty to assess robustness. Our systematic evaluation reveals that current VLA models, despite being trained on diverse datasets, struggle to generalize effectively to these unseen tasks. To overcome this limitation, we propose Cross-Task In-Context Manipulation (X-ICM), a method that conditions large language models (LLMs) on in-context demonstrations from seen tasks to predict action sequences for unseen tasks. Additionally, we introduce a dynamics-guided sample selection strategy that identifies relevant demonstrations by capturing cross-task dynamics. On AGNOSTOS, X-ICM significantly improves cross-task zero-shot generalization performance over leading VLAs. We believe AGNOSTOS and X-ICM will serve as valuable tools for advancing general-purpose robotic manipulation.
LGJul 17, 2025
Apple Intelligence Foundation Language Models: Tech Report 2025Ethan Li, Anders Boesen Lindbo Larsen, Chen Zhang et al. · apple-ml, cmu
We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transformer that combines track parallelism, mixture-of-experts sparse computation, and interleaved global-local attention to deliver high quality with competitive cost on Apple's Private Cloud Compute platform. Both models are trained on large-scale multilingual and multimodal datasets sourced via responsible web crawling, licensed corpora, and high-quality synthetic data, then further refined with supervised fine-tuning and reinforcement learning on a new asynchronous platform. The resulting models support several additional languages while understanding images and executing tool calls. In public benchmarks and human evaluations, both the server model and the on-device model match or surpass comparably sized open baselines. A new Swift-centric Foundation Models framework exposes guided generation, constrained tool calling, and LoRA adapter fine-tuning, allowing developers to integrate these capabilities with a few lines of code. The latest advancements in Apple Intelligence models are grounded in our Responsible AI approach with safeguards like content filtering and locale-specific evaluation, as well as our commitment to protecting our users' privacy with innovations like Private Cloud Compute.
AISep 26, 2025
Hilbert: Recursively Building Formal Proofs with Informal ReasoningSumanth Varambally, Thomas Voice, Yanchao Sun et al.
Large Language Models (LLMs) demonstrate impressive mathematical reasoning abilities, but their solutions frequently contain errors that cannot be automatically verified. Formal theorem proving systems such as Lean 4 offer automated verification with complete accuracy, motivating recent efforts to build specialized prover LLMs that generate verifiable proofs in formal languages. However, a significant gap remains: current prover LLMs solve substantially fewer problems than general-purpose LLMs operating in natural language. We introduce Hilbert, an agentic framework that bridges this gap by combining the complementary strengths of informal reasoning and formal verification. Our system orchestrates four components: an informal LLM that excels at mathematical reasoning, a specialized prover LLM optimized for Lean 4 tactics, a formal verifier, and a semantic theorem retriever. Given a problem that the prover is unable to solve, Hilbert employs recursive decomposition to split the problem into subgoals that it solves with the prover or reasoner LLM. It leverages verifier feedback to refine incorrect proofs as necessary. Experimental results demonstrate that Hilbert substantially outperforms existing approaches on key benchmarks, achieving 99.2% on miniF2F, 6.6% points above the best publicly available method. Hilbert achieves the best known result on PutnamBench. It solves 462/660 problems (70.0%), outperforming proprietary approaches like SeedProver (50.4%) and achieving a 422% improvement over the best publicly available baseline. Thus, Hilbert effectively narrows the gap between informal reasoning and formal proof generation.
MLNov 19, 2025
Atlas Gaussian processes on restricted domains and point cloudsMu Niu, Yue Zhang, Ke Ye et al.
In real-world applications, data often reside in restricted domains with unknown boundaries, or as high-dimensional point clouds lying on a lower-dimensional, nontrivial, unknown manifold. Traditional Gaussian Processes (GPs) struggle to capture the underlying geometry in such settings. Some existing methods assume a flat space embedded in a point cloud, which can be represented by a single latent chart (latent space), while others exhibit weak performance when the point cloud is sparse or irregularly sampled. The goal of this work is to address these challenges. The main contributions are twofold: (1) We establish the Atlas Brownian Motion (BM) framework for estimating the heat kernel on point clouds with unknown geometries and nontrivial topological structures; (2) Instead of directly using the heat kernel estimates, we construct a Riemannian corrected kernel by combining the global heat kernel with local RBF kernel and leading to the formulation of Riemannian-corrected Atlas Gaussian Processes (RC-AGPs). The resulting RC-AGPs are applied to regression tasks across synthetic and real-world datasets. These examples demonstrate that our method outperforms existing approaches in both heat kernel estimation and regression accuracy. It improves statistical inference by effectively bridging the gap between complex, high-dimensional observations and manifold-based inferences.
CVOct 1, 2025
EgoTraj-Bench: Towards Robust Trajectory Prediction Under Ego-view Noisy ObservationsJiayi Liu, Jiaming Zhou, Ke Ye et al.
Reliable trajectory prediction from an ego-centric perspective is crucial for robotic navigation in human-centric environments. However, existing methods typically assume idealized observation histories, failing to account for the perceptual artifacts inherent in first-person vision, such as occlusions, ID switches, and tracking drift. This discrepancy between training assumptions and deployment reality severely limits model robustness. To bridge this gap, we introduce EgoTraj-Bench, the first real-world benchmark that grounds noisy, first-person visual histories in clean, bird's-eye-view future trajectories, enabling robust learning under realistic perceptual constraints. Building on this benchmark, we propose BiFlow, a dual-stream flow matching model that concurrently denoises historical observations and forecasts future motion by leveraging a shared latent representation. To better model agent intent, BiFlow incorporates our EgoAnchor mechanism, which conditions the prediction decoder on distilled historical features via feature modulation. Extensive experiments show that BiFlow achieves state-of-the-art performance, reducing minADE and minFDE by 10-15% on average and demonstrating superior robustness. We anticipate that our benchmark and model will provide a critical foundation for developing trajectory forecasting systems truly resilient to the challenges of real-world, ego-centric perception.
CVDec 16, 2024
EGP3D: Edge-guided Geometric Preserving 3D Point Cloud Super-resolution for RGB-D cameraZheng Fang, Ke Ye, Yaofang Liu et al.
Point clouds or depth images captured by current RGB-D cameras often suffer from low resolution, rendering them insufficient for applications such as 3D reconstruction and robots. Existing point cloud super-resolution (PCSR) methods are either constrained by geometric artifacts or lack attention to edge details. To address these issues, we propose an edge-guided geometric-preserving 3D point cloud super-resolution (EGP3D) method tailored for RGB-D cameras. Our approach innovatively optimizes the point cloud with an edge constraint on a projected 2D space, thereby ensuring high-quality edge preservation in the 3D PCSR task. To tackle geometric optimization challenges in super-resolution point clouds, particularly preserving edge shapes and smoothness, we introduce a multi-faceted loss function that simultaneously optimizes the Chamfer distance, Hausdorff distance, and gradient smoothness. Existing datasets used for point cloud upsampling are predominantly synthetic and inadequately represent real-world scenarios, neglecting noise and stray light effects. To address the scarcity of realistic RGB-D data for PCSR tasks, we built a dataset that captures real-world noise and stray-light effects, offering a more accurate representation of authentic environments. Validated through simulations and real-world experiments, the proposed method exhibited superior performance in preserving edge clarity and geometric details.
LGJan 24, 2024
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token DetectionKe Ye, Heinrich Jiang, Afshin Rostamizadeh et al.
Pre-training large language models is known to be extremely resource intensive and often times inefficient, under-utilizing the information encapsulated in the training text sequences. In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial $τ$ iterations, then transitions to standard SC loss. We show empirically that the effectiveness of the hybrid objective is tied to the two-stage pre-training schedule, and provide extensive analysis on why this is the case. In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training, while enabling a 50% reduction in pre-training iterations and 40% reduction in total FLOPs. Alternatively, given the same amount of computing budget, we find that SpacTor results in significantly improved downstream benchmark performance.
CLDec 19, 2023
Gemini: A Family of Highly Capable Multimodal ModelsGemini Team, Rohan Anil, Sebastian Borgeaud et al.
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
OCJun 25, 2020
Intrinsic Gaussian Processes on Manifolds and Their Accelerations by SymmetryKe Ye, Mu Niu, Pokman Cheung et al.
Amidst the growing interest in nonparametric regression, we address a significant challenge in Gaussian processes(GP) applied to manifold-based predictors. Existing methods primarily focus on low dimensional constrained domains for heat kernel estimation, limiting their effectiveness in higher-dimensional manifolds. Our research proposes an intrinsic approach for constructing GP on general manifolds such as orthogonal groups, unitary groups, Stiefel manifolds and Grassmannian manifolds. Our methodology estimates the heat kernel by simulating Brownian motion sample paths using the exponential map, ensuring independence from the manifold's embedding. The introduction of our strip algorithm, tailored for manifolds with extra symmetries, and the ball algorithm, designed for arbitrary manifolds, constitutes our significant contribution. Both algorithms are rigorously substantiated through theoretical proofs and numerical testing, with the strip algorithm showcasing remarkable efficiency gains over traditional methods. This intrinsic approach delivers several key advantages, including applicability to high dimensional manifolds, eliminating the requirement for global parametrization or embedding. We demonstrate its practicality through regression case studies (torus knots and eight dimensional projective spaces) and by developing binary classifiers for real world datasets (gorilla skulls planar images and diffusion tensor images). These classifiers outperform traditional methods, particularly in limited data scenarios.
CVApr 5, 2016
Cohomology of Cryo-Electron MicroscopyKe Ye, Lek-Heng Lim
The goal of cryo-electron microscopy (EM) is to reconstruct the 3-dimensional structure of a molecule from a collection of its 2-dimensional projected images. In this article, we show that the basic premise of cryo-EM --- patching together 2-dimensional projections to reconstruct a 3-dimensional object --- is naturally one of Cech cohomology with SO(2)-coefficients. We deduce that every cryo-EM reconstruction problem corresponds to an oriented circle bundle on a simplicial complex, allowing us to classify cryo-EM problems via principal bundles. In practice, the 2-dimensional images are noisy and a main task in cryo-EM is to denoise them. We will see how the aforementioned insights can be used towards this end.