Victor Minden

CV
7papers
45citations
Novelty53%
AI Score40

7 Papers

NAJan 6, 2017
A recursive skeletonization factorization based on strong admissibility

Victor Minden, Kenneth L. Ho, Anil Damle et al.

We introduce the strong recursive skeletonization factorization (RS-S), a new approximate matrix factorization based on recursive skeletonization for solving discretizations of linear integral equations associated with elliptic partial differential equations in two and three dimensions (and other matrices with similar hierarchical rank structure). Unlike previous skeletonization-based factorizations, RS-S uses a simple modification of skeletonization, strong skeletonization, which compresses only far-field interactions. This leads to an approximate factorization in the form of a product of many block unit-triangular matrices that may be used as a preconditioner or moderate-accuracy direct solver, with dramatically reduced rank growth. We further combine the strong skeletonization procedure with alternating near-field compression to obtain the hybrid recursive skeletonization factorization (RS-WS), a modification of RS-S that exhibits reduced storage cost in many settings. Under suitable rank assumptions both RS-S and RS-WS exhibit linear computational complexity, which we demonstrate with a number of numerical examples.

NANov 2, 2015
A technique for updating hierarchical skeletonization-based factorizations of integral operators

Victor Minden, Anil Damle, Kenneth L. Ho et al.

We present a method for updating certain hierarchical factorizations for solving linear integral equations with elliptic kernels. In particular, given a factorization corresponding to some initial geometry or material parameters, we can locally perturb the geometry or coefficients and update the initial factorization to reflect this change with asymptotic complexity that is polylogarithmic in the total number of unknowns and linear in the number of perturbed unknowns. We apply our method to the recursive skeletonization factorization and hierarchical interpolative factorization and demonstrate scaling results for a number of different 2D problem setups.

LGMar 18
Efficient Exploration at Scale

Seyed Mohammad Asghari, Chris Chute, Vikranth Dwaracherla et al.

We develop an online learning algorithm that dramatically improves the data efficiency of reinforcement learning from human feedback (RLHF). Our algorithm incrementally updates reward and language models as choice data is received. The reward model is fit to the choice data, while the language model is updated by a variation of reinforce, with reinforcement signals provided by the reward model. Several features enable the efficiency gains: a small affirmative nudge added to each reinforcement signal, an epistemic neural network that models reward uncertainty, and information-directed exploration. With Gemma large language models (LLMs), our algorithm matches the performance of offline RLHF trained on 200K labels using fewer than 20K labels, representing more than a 10x gain in data efficiency. Extrapolating from our results, we expect our algorithm trained on 1M labels to match offline RLHF trained on 1B labels. This represents a 1,000x gain. To our knowledge, these are the first results to demonstrate that such large improvements are possible.

CVMay 3, 2023
Synthetic DOmain-Targeted Augmentation (S-DOTA) Improves Model Generalization in Digital Pathology

Sai Chowdary Gullapally, Yibo Zhang, Nitin Kumar Mittal et al.

Machine learning algorithms have the potential to improve patient outcomes in digital pathology. However, generalization of these tools is currently limited by sensitivity to variations in tissue preparation, staining procedures and scanning equipment that lead to domain shift in digitized slides. To overcome this limitation and improve model generalization, we studied the effectiveness of two Synthetic DOmain-Targeted Augmentation (S-DOTA) methods, namely CycleGAN-enabled Scanner Transform (ST) and targeted Stain Vector Augmentation (SVA), and compared them against the International Color Consortium (ICC) profile-based color calibration (ICC Cal) method and a baseline method using traditional brightness, color and noise augmentations. We evaluated the ability of these techniques to improve model generalization to various tasks and settings: four models, two model types (tissue segmentation and cell classification), two loss functions, six labs, six scanners, and three indications (hepatocellular carcinoma (HCC), nonalcoholic steatohepatitis (NASH), prostate adenocarcinoma). We compared these methods based on the macro-averaged F1 scores on in-distribution (ID) and out-of-distribution (OOD) test sets across multiple domains, and found that S-DOTA methods (i.e., ST and SVA) led to significant improvements over ICC Cal and baseline on OOD data while maintaining comparable performance on ID data. Thus, we demonstrate that S-DOTA may help address generalization due to domain shift in real world applications.

COAug 6, 2018
Efficient Principal Subspace Projection of Streaming Data Through Fast Similarity Matching

Andrea Giovannucci, Victor Minden, Cengiz Pehlevan et al.

Big data problems frequently require processing datasets in a streaming fashion, either because all data are available at once but collectively are larger than available memory or because the data intrinsically arrive one data point at a time and must be processed online. Here, we introduce a computationally efficient version of similarity matching, a framework for online dimensionality reduction that incrementally estimates the top K-dimensional principal subspace of streamed data while keeping in memory only the last sample and the current iterate. To assess the performance of our approach, we construct and make public a test suite containing both a synthetic data generator and the infrastructure to test online dimensionality reduction algorithms on real datasets, as well as performant implementations of our algorithm and competing algorithms with similar aims. Among the algorithms considered we find our approach to be competitive, performing among the best on both synthetic and real data.

MESep 7, 2017
Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations

Victor Minden, Anil Damle, Kenneth L. Ho et al.

Maximum likelihood estimation for parameter-fitting given observations from a Gaussian process in space is a computationally-demanding task that restricts the use of such methods to moderately-sized datasets. We present a framework for unstructured observations in two spatial dimensions that allows for evaluation of the log-likelihood and its gradient (i.e., the score equations) in $\tilde O(n^{3/2})$ time under certain assumptions, where $n$ is the number of observations. Our method relies on the skeletonization procedure described by Martinsson & Rokhlin in the form of the recursive skeletonization factorization of Ho & Ying. Combining this with an adaptation of the matrix peeling algorithm of Lin et al. for constructing $\mathcal{H}$-matrix representations of black-box operators, we obtain a framework that can be used in the context of any first-order optimization routine to quickly and accurately compute maximum-likelihood estimates.

MLMay 30, 2017
Sparse canonical correlation analysis

Xiaotong Suo, Victor Minden, Bradley Nelson et al.

Canonical correlation analysis was proposed by Hotelling [6] and it measures linear relationship between two multidimensional variables. In high dimensional setting, the classical canonical correlation analysis breaks down. We propose a sparse canonical correlation analysis by adding l1 constraints on the canonical vectors and show how to solve it efficiently using linearized alternating direction method of multipliers (ADMM) and using TFOCS as a black box. We illustrate this idea on simulated data.