Uri Ascher

NA
6papers
164citations
Novelty55%
AI Score46

6 Papers

NAJul 30, 2014
Improved bounds on sample size for implicit matrix trace estimators

Farbod Roosta-Khorasani, Uri Ascher

This article is concerned with Monte-Carlo methods for the estimation of the trace of an implicitly given matrix $A$ whose information is only available through matrix-vector products. Such a method approximates the trace by an average of $N$ expressions of the form $\ww^t (A\ww)$, with random vectors $\ww$ drawn from an appropriate distribution. We prove, discuss and experiment with bounds on the number of realizations $N$ required in order to guarantee a probabilistic bound on the relative error of the trace estimation upon employing Rademacher (Hutchinson), Gaussian and uniform unit vector (with and without replacement) probability distributions. In total, one necessary bound and six sufficient bounds are proved, improving upon and extending similar estimates obtained in the seminal work of Avron and Toledo (2011) in several dimensions. We first improve their bound on $N$ for the Hutchinson method, dropping a term that relates to $rank(A)$ and making the bound comparable with that for the Gaussian estimator. We further prove new sufficient bounds for the Hutchinson, Gaussian and the unit vector estimators, as well as a necessary bound for the Gaussian estimator, which depend more specifically on properties of the matrix $A$. As such they may suggest for what type of matrices one distribution or another provides a particularly effective or relatively ineffective stochastic estimation method.

48.9LGMay 7
Target-Aware Data Augmentation for SAT Prediction

Eshed Gal, Uri Ascher, Eldad Haber

Learning-based approaches to NP-hard problems have shown increasing promise, but their progress is fundamentally constrained by the high cost of generating labeled training data. In domains such as Boolean satisfiability (SAT), standard pipelines rely on solver-in-the-loop labeling, which scales poorly with problem size and limits the amount of usable supervision. This bottleneck hinders the broader goal of leveraging machine learning to capture structure in hard combinatorial problems. In this work, we propose a target-aware, solver-free data generation framework for SAT that produces correctly labeled SAT and UNSAT instances by construction, eliminating the need for expensive solver calls. Our method aligns generated instances with the structural properties of a target benchmark, making synthetic data effective for downstream learning. We further develop a linear-programming-aware graph neural network (LPGNN) architecture that incorporates constraint-violation residuals into message passing, enabling the model to exploit underlying optimization structure. Together, these contributions support a data-centric paradigm for learning on NP-hard problems, where scalable, task-aligned data generation is as critical as model design. Our approach yields orders-of-magnitude speedups in data generation, demonstrating that benchmark-aligned synthetic data can effectively augment solver-labeled datasets for GNN-based SAT prediction.

CLNov 27, 2025
Reversing Large Language Models for Efficient Training and Fine-Tuning

Eshed Gal, Moshe Eliasof, Javier Turek et al.

Large Language Models (LLMs) are known for their expensive and time-consuming training. Thus, oftentimes, LLMs are fine-tuned to address a specific task, given the pretrained weights of a pre-trained LLM considered a foundation model. In this work, we introduce memory-efficient, reversible architectures for LLMs, inspired by symmetric and symplectic differential equations, and investigate their theoretical properties. Different from standard, baseline architectures that store all intermediate activations, the proposed models use time-reversible dynamics to retrieve hidden states during backpropagation, relieving the need to store activations. This property allows for a drastic reduction in memory consumption, allowing for the processing of larger batch sizes for the same available memory, thereby offering improved throughput. In addition, we propose an efficient method for converting existing, non-reversible LLMs into reversible architectures through fine-tuning, rendering our approach practical for exploiting existing pre-trained models. Our results show comparable or improved performance on several datasets and benchmarks, on several LLMs, building a scalable and efficient path towards reducing the memory and computational costs associated with both training from scratch and fine-tuning of LLMs.

NADec 3, 2014
Algorithms that satisfy a stopping criterion, probably

Uri Ascher, Farbod Roosta-Khorasani

Iterative numerical algorithms are typically equipped with a stopping criterion, where the iteration process is terminated when some error or misfit measure is deemed to be below a given tolerance. This is a useful setting for comparing algorithm performance, among other purposes. However, in practical applications a precise value for such a tolerance is rarely known; rather, only some possibly vague idea of the desired quality of the numerical approximation is at hand. We discuss four case studies from different areas of numerical computation, where uncertainty in the error tolerance value and in the stopping criterion is revealed in different ways. This leads us to think of approaches to relax the notion of exactly satisfying a tolerance value. We then concentrate on a {\em probabilistic} relaxation of the given tolerance. This allows, for instance, derivation of proven bounds on the sample size of certain Monte Carlo methods. We describe an algorithm that becomes more efficient in a controlled way as the uncertainty in the tolerance increases, and demonstrate this in the context of some particular applications of inverse problems.

NADec 1, 2014
Data completion and stochastic algorithms for PDE inversion problems with many measurements

Farbod Roosta-Khorasani, Kees van den Doel, Uri Ascher

Inverse problems involving systems of partial differential equations (PDEs) with many measurements or experiments can be very expensive to solve numerically. In a recent paper we examined dimensionality reduction methods, both stochastic and deterministic, to reduce this computational burden, assuming that all experiments share the same set of receivers. In the present article we consider the more general and practically important case where receivers are not shared across experiments. We propose a data completion approach to alleviate this problem. This is done by means of an approximation using an appropriately restricted gradient or Laplacian regularization, extending existing data for each experiment to the union of all receiver locations. Results using the method of simultaneous sources (SS) with the completed data are then compared to those obtained by a more general but slower random subset (RS) method which requires no modifications.

CVAug 12, 2013
Faster gradient descent and the efficient recovery of images

Hui Huang, Uri Ascher

Much recent attention has been devoted to gradient descent algorithms where the steepest descent step size is replaced by a similar one from a previous iteration or gets updated only once every second step, thus forming a {\em faster gradient descent method}. For unconstrained convex quadratic optimization these methods can converge much faster than steepest descent. But the context of interest here is application to certain ill-posed inverse problems, where the steepest descent method is known to have a smoothing, regularizing effect, and where a strict optimization solution is not necessary. Specifically, in this paper we examine the effect of replacing steepest descent by a faster gradient descent algorithm in the practical context of image deblurring and denoising tasks. We also propose several highly efficient schemes for carrying out these tasks independently of the step size selection, as well as a scheme for the case where both blur and significant noise are present. In the above context there are situations where many steepest descent steps are required, thus building slowness into the solution procedure. Our general conclusion regarding gradient descent methods is that in such cases the faster gradient descent methods offer substantial advantages. In other situations where no such slowness buildup arises the steepest descent method can still be very effective.