Clayton Webster

16papers

121citations

Novelty46%

AI Score28

Ranked #158,716 of 205,806 authors (top 77%)#1,431 in NA (top 43%)

16 Papers

NANov 8, 2017

An improved discrete least-squares/reduced-basis method for parameterized elliptic PDEs

Max Gunzburger, Michael Schneier, Clayton Webster et al.

It is shown that the computational efficiency of the discrete least-squares (DLS) approximation of solutions of stochastic elliptic PDEs is improved by incorporating a reduced-basis method into the DLS framework. The goal is to recover the entire solution map from the parameter space to the finite element space. To this end, first, a reduced-basis solution using a weak greedy algorithm is constructed, then a DLS approximation is determined by evaluating the reduced-basis approximation instead of the full finite element approximation. The main advantage of the new approach is that one only need apply the DLS operator to the coefficients of the reduced-basis expansion, resulting in huge savings in both the storage of the DLS coefficients and the online cost of evaluating the DLS approximation. In addition, the recently developed quasi-optimal polynomial space is also adopted in the new approach, resulting in superior convergence rates for a wider class of problems than previous analyzed. Numerical experiments are provided that illustrate the theoretical results.

NADec 14, 2018

A mixed $\ell_1$ regularization approach for sparse simultaneous approximation of parameterized PDEs

Nick Dexter, Hoang Tran, Clayton Webster

We present and analyze a novel sparse polynomial technique for the simultaneous approximation of parameterized partial differential equations (PDEs) with deterministic and stochastic inputs. Our approach treats the numerical solution as a jointly sparse reconstruction problem through the reformulation of the standard basis pursuit denoising, where the set of jointly sparse vectors is infinite. To achieve global reconstruction of sparse solutions to parameterized elliptic PDEs over both physical and parametric domains, we combine the standard measurement scheme developed for compressed sensing in the context of bounded orthonormal systems with a novel mixed-norm based $\ell_1$ regularization method that exploits both energy and sparsity. In addition, we are able to prove that, with minimal sample complexity, error estimates comparable to the best $s$-term and quasi-optimal approximations are achievable, while requiring only a priori bounds on polynomial truncation error with respect to the energy norm. Finally, we perform extensive numerical experiments on several high-dimensional parameterized elliptic PDE models to demonstrate the superior recovery properties of the proposed approach.

NAJun 22, 2016

Explicit cost bounds of stochastic Galerkin approximations for parameterized PDEs with random coefficients

Nick Dexter, Clayton Webster, Guannan Zhang

This work analyzes the overall computational complexity of the stochastic Galerkin finite element method (SGFEM) for approximating the solution of parameterized elliptic partial differential equations with both affine and non-affine random coefficients. To compute the fully discrete solution, such approaches employ a Galerkin projection in both the deterministic and stochastic domains, produced here by a combination of finite elements and a global orthogonal basis, defined on an isotopic total degree index set, respectively. To account for the sparsity of the resulting system, we present a rigorous cost analysis that considers the total number of coupled finite element systems that must be simultaneously solved in the SGFEM. However, to maintain sparsity as the coefficient becomes increasingly nonlinear in the parameterization, it is necessary to also approximate the coefficient by an additional orthogonal expansion. In this case we prove a rigorous complexity estimate for the number of floating point operations (FLOPs) required per matrix-vector multiplication of the coupled system. Based on such complexity estimates we also develop explicit cost bounds in terms of FLOPs to solve the stochastic Galerkin (SG) systems to a prescribed tolerance, which are used to compare with the minimal complexity estimates of a stochastic collocation finite element method (SCFEM), shown in our previous work [16]. Finally, computational evidence complements the theoretical estimates and supports our conclusion that, in the case that the coefficient is affine, the coupled SG system can be solved more efficiently than the decoupled SC systems. However, as the coefficient becomes more nonlinear, it becomes prohibitively expensive to obtain an approximation with the SGFEM.

NAMay 25, 2019

Reconstruction of jointly sparse vectors via manifold optimization

Armenak Petrosyan, Hoang Tran, Clayton Webster

In this paper, we consider the challenge of reconstructing jointly sparse vectors from linear measurements. Firstly, we show that by utilizing the rank of the output data matrix we can reduce the problem to a full column rank case. This result reveals a reduction in the computational complexity of the original problem and enables a simple implementation of joint sparse recovery algorithms for full-rank setting. Secondly, we propose a new method for joint sparse recovery in the form of a non-convex optimization problem on a non-compact Stiefel manifold. In our numerical experiments our method outperforms the commonly used $\ell_{2,1}$ minimization in the sense that much fewer measurements are required for accurate sparse reconstructions. We postulate this approach possesses the desirable rank aware property, that is, being able to take advantage of the rank of the unknown matrix to improve the recovery.

LGOct 9, 2023

Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks

Andrew Starnes, Anton Dereventsov, Clayton Webster

In this effort, we consider the impact of regularization on the diversity of actions taken by policies generated from reinforcement learning agents trained using a policy gradient. Policy gradient agents are prone to entropy collapse, which means certain actions are seldomly, if ever, selected. We augment the optimization objective function for the policy with terms constructed from various $\varphi$-divergences and Maximum Mean Discrepancy which encourages current policies to follow different state visitation and/or action choice distribution than previously computed policies. We provide numerical experiments using MNIST, CIFAR10, and Spotify datasets. The results demonstrate the advantage of diversity-promoting policy regularization and that its use on gradient-based approaches have significantly improved performance on a variety of personalization tasks. Furthermore, numerical evidence is given to show that policy regularization increases performance without losing accuracy.

IRSep 11, 2024

Mamba for Scalable and Efficient Personalized Recommendations

Andrew Starnes, Clayton Webster

In this effort, we propose using the Mamba for handling tabular data in personalized recommendation systems. We present the \textit{FT-Mamba} (Feature Tokenizer\,$+$\,Mamba), a novel hybrid model that replaces Transformer layers with Mamba layers within the FT-Transformer architecture, for handling tabular data in personalized recommendation systems. The \textit{Mamba model} offers an efficient alternative to Transformers, reducing computational complexity from quadratic to linear by enhancing the capabilities of State Space Models (SSMs). FT-Mamba is designed to improve the scalability and efficiency of recommendation systems while maintaining performance. We evaluate FT-Mamba in comparison to a traditional Transformer-based model within a Two-Tower architecture on three datasets: Spotify music recommendation, H\&M fashion recommendation, and vaccine messaging recommendation. Each model is trained on 160,000 user-action pairs, and performance is measured using precision (P), recall (R), Mean Reciprocal Rank (MRR), and Hit Ratio (HR) at several truncation values. Our results demonstrate that FT-Mamba outperforms the Transformer-based model in terms of computational efficiency while maintaining or exceeding performance across key recommendation metrics. By leveraging Mamba layers, FT-Mamba provides a scalable and effective solution for large-scale personalized recommendation systems, showcasing the potential of the Mamba architecture to enhance both efficiency and accuracy.

LGDec 24, 2021

On the Unreasonable Efficiency of State Space Clustering in Personalization Tasks

Anton Dereventsov, Ranga Raju Vatsavai, Clayton Webster

In this effort we consider a reinforcement learning (RL) technique for solving personalization tasks with complex reward signals. In particular, our approach is based on state space clustering with the use of a simplistic $k$-means algorithm as well as conventional choices of the network architectures and optimization algorithms. Numerical examples demonstrate the efficiency of different RL procedures and are used to illustrate that this technique accelerates the agent's ability to learn and does not restrict the agent's performance.

LGJun 7, 2021

Offline Policy Comparison under Limited Historical Agent-Environment Interactions

Anton Dereventsov, Joseph D. Daws, Clayton Webster

We address the challenge of policy evaluation in real-world applications of reinforcement learning systems where the available historical data is limited due to ethical, practical, or security considerations. This constrained distribution of data samples often leads to biased policy evaluation estimates. To remedy this, we propose that instead of policy evaluation, one should perform policy comparison, i.e. to rank the policies of interest in terms of their value based on available historical data. In addition we present the Limited Data Estimator (LDE) as a simple method for evaluating and comparing policies from a small number of interactions with the environment. According to our theoretical analysis, the LDE is shown to be statistically reliable on policy comparison tasks under mild assumptions on the distribution of the historical data. Additionally, our numerical experiments compare the LDE to other policy evaluation methods on the task of policy ranking and demonstrate its advantage in various settings.

NADec 4, 2019

Analysis of Deep Neural Networks with Quasi-optimal polynomial approximation rates

Joseph Daws, Clayton Webster

We show the existence of a deep neural network capable of approximating a wide class of high-dimensional approximations. The construction of the proposed neural network is based on a quasi-optimal polynomial approximation. We show that this network achieves an error rate that is sub-exponential in the number of polynomial functions, $M$, used in the polynomial approximation. The complexity of the network which achieves this sub-exponential rate is shown to be algebraic in $M$.

LGOct 7, 2019

Neural network integral representations with the ReLU activation function

Armenak Petrosyan, Anton Dereventsov, Clayton Webster

In this effort, we derive a formula for the integral representation of a shallow neural network with the ReLU activation function. We assume that the outer weighs admit a finite $L_1$-norm with respect to Lebesgue measure on the sphere. For univariate target functions we further provide a closed-form formula for all possible representations. Additionally, in this case our formula allows one to explicitly solve the least $L_1$-norm neural network representation for a given function.

LGMay 24, 2019

Robust learning with implicit residual networks

Viktor Reshniak, Clayton Webster

In this effort, we propose a new deep architecture utilizing residual blocks inspired by implicit discretization schemes. As opposed to the standard feed-forward networks, the outputs of the proposed implicit residual blocks are defined as the fixed points of the appropriately chosen nonlinear transformations. We show that this choice leads to the improved stability of both forward and backward propagations, has a favorable impact on the generalization power and allows to control the robustness of the network with only a few hyperparameters. In addition, the proposed reformulation of ResNet does not introduce new parameters and can potentially lead to a reduction in the number of required layers due to improved forward stability. Finally, we derive the memory-efficient training algorithm, propose a stochastic regularization technique and provide numerical results in support of our findings.

LGMay 24, 2019

Greedy Shallow Networks: An Approach for Constructing and Training Neural Networks

Anton Dereventsov, Armenak Petrosyan, Clayton Webster

We present a greedy-based approach to construct an efficient single hidden layer neural network with the ReLU activation that approximates a target function. In our approach we obtain a shallow network by utilizing a greedy algorithm with the prescribed dictionary provided by the available training data and a set of possible inner weights. To facilitate the greedy selection process we employ an integral representation of the network, based on the ridgelet transform, that significantly reduces the cardinality of the dictionary and hence promotes feasibility of the greedy selection. Our approach allows for the construction of efficient architectures which can be treated either as improved initializations to be used in place of random-based alternatives, or as fully-trained networks in certain cases, thus potentially nullifying the need for backpropagation training. Numerical experiments demonstrate the tenability of the proposed concept and its advantages compared to the conventional techniques for selecting architectures and initializations for neural networks.

NAMay 14, 2019

Reconstructing high-dimensional Hilbert-valued functions via compressed sensing

Nick Dexter, Hoang Tran, Clayton Webster

We present and analyze a novel sparse polynomial technique for approximating high-dimensional Hilbert-valued functions, with application to parameterized partial differential equations (PDEs) with deterministic and stochastic inputs. Our theoretical framework treats the function approximation problem as a joint sparse recovery problem, where the set of jointly sparse vectors is possibly infinite. To achieve the simultaneous reconstruction of Hilbert-valued functions in both parametric domain and Hilbert space, we propose a novel mixed-norm based $\ell_1$ regularization method that exploits both energy and sparsity. Our approach requires extensions of concepts such as the restricted isometry and null space properties, allowing us to prove recovery guarantees for sparse Hilbert-valued function reconstruction. We complement the enclosed theory with an algorithm for Hilbert-valued recovery, based on standard forward-backward algorithm, meanwhile establishing its strong convergence in the considered infinite-dimensional setting. Finally, we demonstrate the minimal sample complexity requirements of our approach, relative to other popular methods, with numerical experiments approximating the solutions of high-dimensional parameterized elliptic PDEs.

NAOct 6, 2018

Analysis of sparse recovery for Legendre expansions using envelope bound

Hoang Tran, Clayton Webster

We provide novel sufficient conditions for the uniform recovery of sparse Legendre expansions using $\ell_1$ minimization, where the sampling points are drawn according to orthogonalization (uniform) measure. So far, conditions of the form $m \gtrsim Θ^2 s \times \textit{log factors}$ have been relied on to determine the minimum number of samples $m$ that guarantees successful reconstruction of $s$-sparse vectors when the measurement matrix is associated to an orthonormal system. However, in case of sparse Legendre expansions, the uniform bound $Θ$ of Legendre systems is so high that these conditions are unable to provide meaningful guarantees. In this paper, we present an analysis which employs the envelop bound of all Legendre polynomials instead, and prove a new recovery guarantee for $s$-sparse Legendre expansions, $$ m \gtrsim {s^2} \times \textit{log factors}, $$ which is independent of $Θ$. Arguably, this is the first recovery condition established for orthonormal systems without assuming the uniform boundedness of the sampling matrix. The key ingredient of our analysis is an extension of chaining arguments, recently developed in [Bou14,CDTW15], to handle the envelope bound. Furthermore, our recovery condition is proved via restricted eigenvalue property, a less demanding replacement of restricted isometry property which is perfectly suited to the considered scenario. Along the way, we derive simple criteria to detect good sample sets. Our numerical tests show that sets of uniformly sampled points that meet these criteria will perform better recovery on average.

NAAug 4, 2015

An Efficient Meshfreee Implicit Filter for Nonlinear Filtering Problems

Feng Bao, Yanzhao Cao, Clayton Webster et al.

In this paper, we propose a meshfree approximation method for the implicit filter developed in [2], which is a novel numerical algorithm for nonlinear filtering problems. The implicit filter approximates conditional distributions in the optimal filter over a deterministic state space grid and is developed from samples of the current state obtained by solving the state equation implicitly. The purpose of the meshfree approximation is to improve the efficiency of the implicit filter in moderately high-dimensional problems. The construction of the algorithm includes generation of random state space points and a meshfree interpolation method. Numerical experiments show the effectiveness and efficiency of our algorithm.

NAJul 27, 2015

Numerical Methods for a Class of Nonlocal Diffusion Problems with the Use of Backward SDEs

Guannan Zhang, Weidong Zhao, Clayton Webster et al.

We propose a novel numerical approach for nonlocal diffusion equations [8] with integrable kernels, based on the relationship between the backward Kolmogorov equation and backward stochastic differential equations (BSDEs) driven by Lèvy processes with jumps. The nonlocal diffusion problem under consideration is converted to a BSDE,for which numerical schemes are developed and applied directly. As a stochastic approach, the proposed method does not require the solution of linear systems, which allows for embarrassingly parallel implementations and also enables adaptive approximation techniques to be incorporated in a straightforward fashion. Moreover, our method is more accurate than classic stochastic approaches due to the use of high-order temporal and spatial discretization schemes. In addition, our approach can handle a broad class of problems with general nonlinear forcing terms as long as they are globally Lipchitz continuous. Rigorous error analysis of the new method is provided as several numerical examples that illustrate the effectiveness and efficiency of the proposed approach.