NAFeb 22, 2011
Shifted Power Method for Computing Tensor EigenpairsTamara G. Kolda, Jackson R. Mayo
Recent work on eigenvalues and eigenvectors for tensors of order m >= 3 has been motivated by applications in blind source separation, magnetic resonance imaging, molecular conformation, and more. In this paper, we consider methods for computing real symmetric-tensor eigenpairs of the form Ax^{m-1} = λx subject to ||x||=1, which is closely related to optimal rank-1 approximation of a symmetric tensor. Our contribution is a shifted symmetric higher-order power method (SS-HOPM), which we show is guaranteed to converge to a tensor eigenpair. SS-HOPM can be viewed as a generalization of the power iteration method for matrices or of the symmetric higher-order power method. Additionally, using fixed point analysis, we can characterize exactly which eigenpairs can and cannot be found by the method. Numerical examples are presented, including examples from an extension of the method to finding complex eigenpairs.
NAAug 14, 2012
On Tensors, Sparsity, and Nonnegative FactorizationsEric C. Chi, Tamara G. Kolda
Tensors have found application in a variety of fields, ranging from chemometrics to signal processing and beyond. In this paper, we consider the problem of multilinear modeling of sparse count data. Our goal is to develop a descriptive tensor factorization model of such data, along with appropriate algorithms and theory. To do so, we propose that the random variation is best described via a Poisson distribution, which better describes the zeros observed in the data as compared to the typical assumption of a Gaussian distribution. Under a Poisson assumption, we fit a model to observed data using the negative log-likelihood score. We present a new algorithm for Poisson tensor factorization called CANDECOMP-PARAFAC Alternating Poisson Regression (CP-APR) that is based on a majorization-minimization approach. It can be shown that CP-APR is a generalization of the Lee-Seung multiplicative updates. We show how to prevent the algorithm from converging to non-KKT points and prove convergence of CP-APR under mild conditions. We also explain how to implement CP-APR for large-scale sparse tensors and present results on several data sets, both real and simulated.
NAOct 22, 2017
A Practical Randomized CP Tensor DecompositionCasey Battaglino, Grey Ballard, Tamara G. Kolda
The CANDECOMP/PARAFAC (CP) decomposition is a leading method for the analysis of multiway data. The standard alternating least squares algorithm for the CP decomposition (CP-ALS) involves a series of highly overdetermined linear least squares problems. We extend randomized least squares methods to tensors and show the workload of CP-ALS can be drastically reduced without a sacrifice in quality. We introduce techniques for efficiently preprocessing, sampling, and computing randomized least squares on a dense tensor of arbitrary order, as well as an efficient sampling-based technique for checking the stopping condition. We also show more generally that the Khatri-Rao product (used within the CP-ALS iteration) produces conditions favorable for direct sampling. In numerical results, we see improvements in speed, reductions in memory requirements, and robustness with respect to initialization.
NAFeb 23, 2016
Parallel Tensor Compression for Large-Scale Scientific DataWoody Austin, Grey Ballard, Tamara G. Kolda
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8~TB of data, assuming double precision. By viewing the data as a dense five-way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 5000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed-memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.
NAApr 9, 2014
Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric TensorsMartin D. Schatz, Tze Meng Low, Robert A. van de Geijn et al.
Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we propose a blocked data structure (Blocked Compact Symmetric Storage) wherein we consider the tensor by blocks and store only the unique blocks of a symmetric tensor. We propose an algorithm-by-blocks, already shown of benefit for matrix computations, that exploits this storage format by utilizing a series of temporary tensors to avoid redundant computation. Further, partial symmetry within temporaries is exploited to further avoid redundant storage and redundant computation. A detailed analysis shows that, relative to storing and computing with tensors without taking advantage of symmetry and partial symmetry, storage requirements are reduced by a factor of $ O\left( m! \right)$ and computational requirements by a factor of $O\left( (m+1)!/2^m \right)$, where $ m $ is the order of the tensor. However, as the analysis shows, care must be taken in choosing the correct block size to ensure these storage and computational benefits are achieved (particularly for low-order tensors). An implementation demonstrates that storage is greatly reduced and the complexity introduced by storing and computing with tensors by blocks is manageable. Preliminary results demonstrate that computational time is also reduced. The paper concludes with a discussion of how insights in this paper point to opportunities for generalizing recent advances in the domain of linear algebra libraries to the field of multi-linear computation.
NAOct 14, 2010
Making Tensor Factorizations Robust to Non-Gaussian NoiseEric C. Chi, Tamara G. Kolda
Tensors are multi-way arrays, and the Candecomp/Parafac (CP) tensor factorization has found application in many different domains. The CP model is typically fit using a least squares objective function, which is a maximum likelihood estimate under the assumption of i.i.d. Gaussian noise. We demonstrate that this loss function can actually be highly sensitive to non-Gaussian noise. Therefore, we propose a loss function based on the 1-norm because it can accommodate both Gaussian and grossly non-Gaussian perturbations. We also present an alternating majorization-minimization algorithm for fitting a CP model using our proposed loss function.
AIFeb 5
First ProofMohammed Abouzaid, Andrew J. Blumberg, Martin Hairer et al.
To assess the ability of current AI systems to correctly answer research-level mathematics questions, we share a set of ten math questions which have arisen naturally in the research process of the authors. The questions had not been shared publicly until now; the answers are known to the authors of the questions but will remain encrypted for a short time.
NAAug 11, 2024
Tensor Decomposition Meets RKHS: Efficient Algorithms for Smooth and Misaligned DataBrett W. Larsen, Tamara G. Kolda, Anru R. Zhang et al.
The canonical polyadic (CP) tensor decomposition decomposes a multidimensional data array into a sum of outer products of finite-dimensional vectors. Instead, we can replace some or all of the vectors with continuous functions (infinite-dimensional vectors) from a reproducing kernel Hilbert space (RKHS). We refer to tensors with some infinite-dimensional modes as quasitensors, and the approach of decomposing a tensor with some continuous RKHS modes is referred to as CP-HiFi (hybrid infinite and finite dimensional) tensor decomposition. An advantage of CP-HiFi is that it can enforce smoothness in the infinite dimensional modes. Further, CP-HiFi does not require the observed data to lie on a regular and finite rectangular grid and naturally incorporates misaligned data. We detail the methodology and illustrate it on a synthetic example.
24.0NAMar 27
Revisiting Approximate Leverage Score Sketching for Matrix Least SquaresBrett W. Larsen, Tamara G. Kolda
We revisit the problem of sketching using approximate leverage scores for matrix least squares problems of the form $\| AX - B \|_F^2$ where the design matrix $A \in \mathbb{R}^{N \times r}$ is tall and skinny with $N \gg r$. We derive the theoretical results from first principles and clarify the relation to previously stated bounds, improving some constants along the way. One can characterize the utility of a sketching scheme according to the number of samples it needs for an $\varepsilon$-accurate solution with high probability. Assuming $\varepsilon$ is suitably small, we will show that approximate leverage score sampling requires $4r/(βδ\varepsilon)$ samples, where $δ$ is the failure probability and $β\in (0,1]$ is a measure of the quality of the approximate leverage scores such that $β=1$ corresponds to using exact leverage scores. In cases where a few approximate leverage scores are very large (summing to $p_{\rm det}$), we also show that using a hybrid deterministic and random sampling scheme reduces the required number of samples by a factor of $1/(1-p_{\rm det})$.
32.3NAMar 26
Fast and Accurate CP-HIFI Tensor Decompositions: Exploiting Kronecker StructureJohannes J. Brust, Tamara G. Kolda
Tensor decompositions are a fundamental tool in scientific computing and data analysis. In many applications -- such as simulation data on irregular grids, surrogate modeling for parameterized PDEs, or spectroscopic measurements -- the data has both discrete and continuous structure, and may only be observed at scattered sample points. The CP-HIFI (hybrid infinite-finite) decomposition generalizes the Canonical Polyadic (CP) tensor decomposition to settings where some factors are finite-dimensional vectors and others are functions drawn from infinite-dimensional spaces. The decomposition can be applied to a fully observed tensor (aligned) or, when only scattered observations are available, to a sparsely sampled tensor (unaligned). Current methods compute CP-HIFI factors by solving a sequence of dense linear systems arising from regularized least-squares problems to fit reproducing Kernel Hilbert space (RKHS) representations to the data, but these direct solves become computationally prohibitive as problem size grows. We propose new algorithms that achieve the same accuracy while being orders of magnitude faster. For aligned tensors, we exploit the Kronecker structure of the system to efficiently compute its eigendecomposition without ever forming the full system, reducing the solve to independent scalar equations. For unaligned tensors, we introduce a preconditioned conjugate gradient method, exploiting the problem's structure for fast matrix-vector products and efficient preconditioning. In our experiments, the proposed methods speed up the solution up to 500x compared to the prior naive direct methods, in line with the reduction in the theoretical computational complexity.
LGMay 11, 2023
Convergence of Alternating Gradient Descent for Matrix FactorizationRachel Ward, Tamara G. Kolda
We consider alternating gradient descent (AGD) with fixed step size applied to the asymmetric matrix factorization objective. We show that, for a rank-$r$ matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, $T = C (\frac{σ_1(\mathbf{A})}{σ_r(\mathbf{A})})^2 \log(1/ε)$ iterations of alternating gradient descent suffice to reach an $ε$-optimal factorization $\| \mathbf{A} - \mathbf{X} \mathbf{Y}^{T} \|^2 \leq ε\| \mathbf{A}\|^2$ with high probability starting from an atypical random initialization. The factors have rank $d \geq r$ so that $\mathbf{X}_{T}\in\mathbb{R}^{m \times d}$ and $\mathbf{Y}_{T} \in\mathbb{R}^{n \times d}$, and mild overparameterization suffices for the constant $C$ in the iteration complexity $T$ to be an absolute constant. Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly improves the convergence rate of gradient descent in practice. Our proof is conceptually simple: a uniform Polyak-Łojasiewicz (PL) inequality and uniform Lipschitz smoothness constant are guaranteed for a sufficient number of iterations, starting from our random initialization. Our proof method should be useful for extending and simplifying convergence analyses for a broader class of nonconvex low-rank factorization problems.
MLFeb 14, 2022
Tensor Moments of Gaussian Mixture Models: Theory and ApplicationsJoão M. Pereira, Joe Kileel, Tamara G. Kolda
Gaussian mixture models (GMMs) are fundamental tools in statistical and data sciences. We study the moments of multivariate Gaussians and GMMs. The $d$-th moment of an $n$-dimensional random variable is a symmetric $d$-way tensor of size $n^d$, so working with moments naively is assumed to be prohibitively expensive for $d>2$ and larger values of $n$. In this work, we develop theory and numerical methods for \emph{implicit computations} with moment tensors of GMMs, reducing the computational and storage costs to $\mathcal{O}(n^2)$ and $\mathcal{O}(n^3)$, respectively, for general covariance matrices, and to $\mathcal{O}(n)$ and $\mathcal{O}(n)$, respectively, for diagonal ones. We derive concise analytic expressions for the moments in terms of symmetrized tensor products, relying on the correspondence between symmetric tensors and homogeneous polynomials, and combinatorial identities involving Bell polynomials. The primary application of this theory is to estimating GMM parameters (means and covariances) from a set of observations, when formulated as a moment-matching optimization problem. If there is a known and common covariance matrix, we also show it is possible to debias the data observations, in which case the problem of estimating the unknown means reduces to symmetric CP tensor decomposition. Numerical results validate and illustrate the numerical efficiency of our approaches. This work potentially opens the door to the competitiveness of the method of moments as compared to expectation maximization methods for parameter estimation of GMMs.
NAOct 27, 2021
Streaming Generalized Canonical Polyadic Tensor DecompositionsEric Phipps, Nick Johnson, Tamara G. Kolda
In this paper, we develop a method which we call OnlineGCP for computing the Generalized Canonical Polyadic (GCP) tensor decomposition of streaming data. GCP differs from traditional canonical polyadic (CP) tensor decompositions as it allows for arbitrary objective functions which the CP model attempts to minimize. This approach can provide better fits and more interpretable models when the observed tensor data is strongly non-Gaussian. In the streaming case, tensor data is gradually observed over time and the algorithm must incrementally update a GCP factorization with limited access to prior data. In this work, we extend the GCP formalism to the streaming context by deriving a GCP optimization problem to be solved as new tensor data is observed, formulate a tunable history term to balance reconstruction of recently observed data with data observed in the past, develop a scalable solution strategy based on segregated solves using stochastic gradient descent methods, describe a software implementation that provides performance and portability to contemporary CPU and GPU architectures and integrates with Matlab for enhanced useability, and demonstrate the utility and performance of the approach and software on several synthetic and real tensor data sets.
AIApr 19, 2021
Randomized Algorithms for Scientific Computing (RASC)Aydin Buluc, Tamara G. Kolda, Stefan M. Wild et al.
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.
NAJun 4, 2019
Stochastic Gradients for Large-Scale Tensor DecompositionTamara G. Kolda, David Hong
Tensor decomposition is a well-known tool for multiway data analysis. This work proposes using stochastic gradients for efficient generalized canonical polyadic (GCP) tensor decomposition of large-scale tensors. GCP tensor decomposition is a recently proposed version of tensor decomposition that allows for a variety of loss functions such as Bernoulli loss for binary data or Huber loss for robust estimation. The stochastic gradient is formed from randomly sampled elements of the tensor and is efficient because it can be computed using the sparse matricized-tensor-times-Khatri-Rao product (MTTKRP) tensor kernel. For dense tensors, we simply use uniform sampling. For sparse tensors, we propose two types of stratified sampling that give precedence to sampling nonzeros. Numerical results demonstrate the advantages of the proposed approach and its scalability to large-scale problems.
MLAug 22, 2018
XPCA: Extending PCA for a Combination of Discrete and Continuous VariablesClifford Anderson-Bergman, Tamara G. Kolda, Kina Kincher-Winoto
Principal component analysis (PCA) is arguably the most popular tool in multivariate exploratory data analysis. In this paper, we consider the question of how to handle heterogeneous variables that include continuous, binary, and ordinal. In the probabilistic interpretation of low-rank PCA, the data has a normal multivariate distribution and, therefore, normal marginal distributions for each column. If some marginals are continuous but not normal, the semiparametric copula-based principal component analysis (COCA) method is an alternative to PCA that combines a Gaussian copula with nonparametric marginals. If some marginals are discrete or semi-continuous, we propose a new extended PCA (XPCA) method that also uses a Gaussian copula and nonparametric marginals and accounts for discrete variables in the likelihood calculation by integrating over appropriate intervals. Like PCA, the factors produced by XPCA can be used to find latent structure in data, build predictive models, and perform dimensionality reduction. We present the new model, its induced likelihood function, and a fitting algorithm which can be applied in the presence of missing data. We demonstrate how to use XPCA to produce an estimated full conditional distribution for each data point, and use this to produce to provide estimates for missing data that are automatically range respecting. We compare the methods as applied to simulated and real-world data sets that have a mixture of discrete and continuous variables.
NAAug 22, 2018
Generalized Canonical Polyadic Tensor DecompositionDavid Hong, Tamara G. Kolda, Jed A. Duersch
Tensor decomposition is a fundamental unsupervised machine learning method in data science, with applications including network analysis and sensor data processing. This work develops a generalized canonical polyadic (GCP) low-rank tensor decomposition that allows other loss functions besides squared error. For instance, we can use logistic loss or Kullback-Leibler divergence, enabling tensor decomposition for binary or count data. We present a variety statistically-motivated loss functions for various scenarios. We provide a generalized framework for computing gradients and handling missing data that enables the use of standard optimization methods for fitting the model. We demonstrate the flexibility of GCP on several real-world examples including interactions in a social network, neural activity in a mouse, and monthly rainfall measurements in India.