Kim Batselier

LG
h-index6
30papers
259citations
Novelty52%
AI Score56

30 Papers

SYMay 6, 2018
Multidimensional Realization Theory and Polynomial System Solving

Philippe Dreesen, Kim Batselier, Bart De Moor

Multidimensional systems are becoming increasingly important as they provide a promising tool for estimation, simulation and control, while going beyond the traditional setting of one-dimensional systems. The analysis of multidimensional systems is linked to multivariate polynomials, and is therefore more difficult than the well-known analysis of one-dimensional systems, which is linked to univariate polynomials. In the current paper we relate the realization theory for overdetermined autonomous multidimensional systems to the problem of solving a system of polynomial equations. We show that basic notions of linear algebra suffice to analyze and solve the problem. The difference equations are associated with a Macaulay matrix formulation, and it is shown that the null space of the Macaulay matrix is a multidimensional observability matrix. Application of the classical shift trick from realization theory allows for the computation of the corresponding system matrices in a multidimensional state-space setting. This reduces the task of solving a system of polynomial equations to computing an eigenvalue decomposition. We study the occurrence of multiple solutions, as well as the existence and analysis of solutions at infinity, which allow for an interpretation in terms of multidimensional descriptor systems.

NAOct 13, 2016
Tensor Computation: A New Framework for High-Dimensional Problems in EDA

Zheng Zhang, Kim Batselier, Haotian Liu et al.

Many critical EDA problems suffer from the curse of dimensionality, i.e. the very fast-scaling computational burden produced by large number of parameters and/or unknown variables. This phenomenon may be caused by multiple spatial or temporal factors (e.g. 3-D field solvers discretizations and multi-rate circuit simulation), nonlinearity of devices and circuits, large number of design or optimization parameters (e.g. full-chip routing/placement and circuit sizing), or extensive process variations (e.g. variability/reliability analysis and design for manufacturability). The computational challenges generated by such high dimensional problems are generally hard to handle efficiently with traditional EDA core algorithms that are based on matrix and vector computation. This paper presents "tensor computation" as an alternative general framework for the development of efficient EDA algorithms and tools. A tensor is a high-dimensional generalization of a matrix and a vector, and is a natural choice for both storing and solving efficiently high-dimensional EDA problems. This paper gives a basic tutorial on tensors, demonstrates some recent examples of EDA applications (e.g., nonlinear circuit modeling and high-dimensional uncertainty quantification), and suggests further open EDA problems where the use of tensor computation could be of advantage.

SYOct 18, 2016
A Tensor Network Kalman filter with an application in recursive MIMO Volterra system identification

Kim Batselier, Zhongming Chen, Ngai Wong

This article introduces a Tensor Network Kalman filter, which can estimate state vectors that are exponentially large without ever having to explicitly construct them. The Tensor Network Kalman filter also easily accommodates the case where several different state vectors need to be estimated simultaneously. The key lies in rewriting the standard Kalman equations as tensor equations and then implementing them using Tensor Networks, which effectively transforms the exponential storage cost and computational complexity into a linear one. We showcase the power of the proposed framework through an application in recursive nonlinear system identification of high-order discrete-time multiple-input multiple-output (MIMO) Volterra systems. The identification problem is transformed into a linear state estimation problem wherein the state vector contains all Volterra kernel coefficients and is estimated using the Tensor Network Kalman filter. The accuracy and robustness of the scheme are demonstrated via numerical experiments, which show that updating the Kalman filter estimate of a state vector of length $10^9$ and its covariance matrix takes about 0.007s on a standard desktop computer in Matlab.

SYSep 26, 2017
Tensor network subspace identification of polynomial state space models

Kim Batselier, Ching Yun Ko, Ngai Wong

This article introduces a tensor network subspace algorithm for the identification of specific polynomial state space models. The polynomial nonlinearity in the state space model is completely written in terms of a tensor network, thus avoiding the curse of dimensionality. We also prove how the block Hankel data matrices in the subspace method can be exactly represented by low rank tensor networks, reducing the computational and storage complexity significantly. The performance and accuracy of our subspace identification algorithm are illustrated by numerical experiments, showing that our tensor network implementation is around 20 times faster than the standard matrix implementation before the latter fails due to insufficient memory, is robust with respect to noise and can model real-world systems.

NANov 9, 2018
The trouble with tensor ring decompositions

Kim Batselier

The tensor train decomposition decomposes a tensor into a "train" of 3-way tensors that are interconnected through the summation of auxiliary indices. The decomposition is stable, has a well-defined notion of rank and enables the user to perform various linear algebra operations on vectors and matrices of exponential size in a computationally efficient manner. The tensor ring decomposition replaces the train by a ring through the introduction of one additional auxiliary variable. This article discusses a major issue with the tensor ring decomposition: its inability to compute an exact minimal-rank decomposition from a decomposition with sub-optimal ranks. Both the contraction operation and Hadamard product are motivated from applications and it is shown through simple examples how the tensor ring-rounding procedure fails to retrieve minimal-rank decompositions with these operations. These observations, together with the already known issue of not being able to find a best low-rank tensor ring approximation to a given tensor indicate that the applicability of tensor rings is severely limited.

NAOct 18, 2016
Tensor Network alternating linear scheme for MIMO Volterra system identification

Kim Batselier, Zhongming Chen, Ngai Wong

This article introduces two Tensor Network-based iterative algorithms for the identification of high-order discrete-time nonlinear multiple-input multiple-output (MIMO) Volterra systems. The system identification problem is rewritten in terms of a Volterra tensor, which is never explicitly constructed, thus avoiding the curse of dimensionality. It is shown how each iteration of the two identification algorithms involves solving a linear system of low computational complexity. The proposed algorithms are guaranteed to monotonically converge and numerical stability is ensured through the use of orthogonal matrix factorizations. The performance and accuracy of the two identification algorithms are illustrated by numerical experiments, where accurate degree-10 MIMO Volterra models are identified in about 1 second in Matlab on a standard desktop pc.

LGSep 11, 2023
Quantized Fourier and Polynomial Features for more Expressive Tensor Network Models

Frederiek Wesel, Kim Batselier

In the context of kernel machines, polynomial and Fourier features are commonly used to provide a nonlinear extension to linear models by mapping the data to a higher-dimensional space. Unless one considers the dual formulation of the learning problem, which renders exact large-scale learning unfeasible, the exponential increase of model parameters in the dimensionality of the data caused by their tensor-product structure prohibits to tackle high-dimensional problems. One of the possible approaches to circumvent this exponential scaling is to exploit the tensor structure present in the features by constraining the model weights to be an underparametrized tensor network. In this paper we quantize, i.e. further tensorize, polynomial and Fourier features. Based on this feature quantization we propose to quantize the associated model weights, yielding quantized models. We show that, for the same number of model parameters, the resulting quantized models have a higher bound on the VC-dimension as opposed to their non-quantized counterparts, at no additional computational cost while learning from identical features. We verify experimentally how this additional tensorization regularizes the learning problem by prioritizing the most salient features in the data and how it provides models with increased generalization capabilities. We finally benchmark our approach on large regression task, achieving state-of-the-art results on a laptop computer.

LGMay 25, 2022
Position: Tensor Networks are a Valuable Asset for Green AI

Eva Memmel, Clara Menzen, Jetze Schuurmans et al.

For the first time, this position paper introduces a fundamental link between tensor networks (TNs) and Green AI, highlighting their synergistic potential to enhance both the inclusivity and sustainability of AI research. We argue that TNs are valuable for Green AI due to their strong mathematical backbone and inherent logarithmic compression potential. We undertake a comprehensive review of the ongoing discussions on Green AI, emphasizing the importance of sustainability and inclusivity in AI research to demonstrate the significance of establishing the link between Green AI and TNs. To support our position, we first provide a comprehensive overview of efficiency metrics proposed in Green AI literature and then evaluate examples of TNs in the fields of kernel machines and deep learning using the proposed efficiency metrics. This position paper aims to incentivize meaningful, constructive discussions by bridging fundamental principles of Green AI and TNs. We advocate for researchers to seriously evaluate the integration of TNs into their research projects, and in alignment with the link established in this paper, we support prior calls encouraging researchers to treat Green AI principles as a research priority.

MLOct 31, 2023
Projecting basis functions with tensor networks for Gaussian process regression

Clara Menzen, Eva Memmel, Kim Batselier et al.

This paper presents a method for approximate Gaussian process (GP) regression with tensor networks (TNs). A parametric approximation of a GP uses a linear combination of basis functions, where the accuracy of the approximation depends on the total number of basis functions $M$. We develop an approach that allows us to use an exponential amount of basis functions without the corresponding exponential computational complexity. The key idea to enable this is using low-rank TNs. We first find a suitable low-dimensional subspace from the data, described by a low-rank TN. In this low-dimensional subspace, we then infer the weights of our model by solving a Bayesian inference problem. Finally, we project the resulting weights back to the original space to make GP predictions. The benefit of our approach comes from the projection to a smaller subspace: It modifies the shape of the basis functions in a way that it sees fit based on the given data, and it allows for efficient computations in the smaller subspace. In an experiment with an 18-dimensional benchmark data set, we show the applicability of our method to an inverse dynamics problem.

LGSep 5, 2024
Tensor network square root Kalman filter for online Gaussian process regression

Clara Menzen, Manon Kok, Kim Batselier

The state-of-the-art tensor network Kalman filter lifts the curse of dimensionality for high-dimensional recursive estimation problems. However, the required rounding operation can cause filter divergence due to the loss of positive definiteness of covariance matrices. We solve this issue by developing, for the first time, a tensor network square root Kalman filter, and apply it to high-dimensional online Gaussian process regression. In our experiments, we demonstrate that our method is equivalent to the conventional Kalman filter when choosing a full-rank tensor network. Furthermore, we apply our method to a real-life system identification problem where we estimate $4^{14}$ parameters on a standard laptop. The estimated model outperforms the state-of-the-art tensor network Kalman filter in terms of prediction accuracy and uncertainty quantification.

SYMar 17, 2020Code
Nonlinear system identification with regularized Tensor Network B-splines

Ridvan Karagoz, Kim Batselier

This article introduces the Tensor Network B-spline model for the regularized identification of nonlinear systems using a nonlinear autoregressive exogenous (NARX) approach. Tensor network theory is used to alleviate the curse of dimensionality of multivariate B-splines by representing the high-dimensional weight tensor as a low-rank approximation. An iterative algorithm based on the alternating linear scheme is developed to directly estimate the low-rank tensor network approximation, removing the need to ever explicitly construct the exponentially large weight tensor. This reduces the computational and storage complexity significantly, allowing the identification of NARX systems with a large number of inputs and lags. The proposed algorithm is numerically stable, robust to noise, guaranteed to monotonically converge, and allows the straightforward incorporation of regularization. The TNBS-NARX model is validated through the identification of the cascaded watertank benchmark nonlinear system, on which it achieves state-of-the-art performance while identifying a 16-dimensional B-spline surface in 4 seconds on a standard desktop computer. An open-source MATLAB implementation is available on GitHub.

MLApr 29
Laplace Approximation for Bayesian Tensor Network Kernel Machines

Albert Saiapin, Kim Batselier

Uncertainty estimation is essential for robust decision-making in the presence of ambiguous or out-of-distribution inputs. Gaussian Processes (GPs) are classical kernel-based models that offer principled uncertainty quantification and perform well on small- to medium-scale datasets. Alternatively, formulating the weight space learning problem under tensor network assumptions yields scalable tensor network kernel machines. However, these assumptions break Gaussianity, complicating standard probabilistic inference. This raises a fundamental question: how can tensor network kernel machines provide principled uncertainty estimates? We propose a novel Bayesian Tensor Network Kernel Machine (LA-TNKM) that employs a (linearized) Laplace approximation for Bayesian inference. A comprehensive set of numerical experiments shows that the proposed method consistently matches or surpasses Gaussian Processes and Bayesian Neural Networks (BNNs) across diverse UCI regression benchmarks, highlighting both its effectiveness and practical relevance.

LGDec 2, 2025
Tensor Network Based Feature Learning Model

Albert Saiapin, Kim Batselier

Many approximations were suggested to circumvent the cubic complexity of kernel-based algorithms, allowing their application to large-scale datasets. One strategy is to consider the primal formulation of the learning problem by mapping the data to a higher-dimensional space using tensor-product structured polynomial and Fourier features. The curse of dimensionality due to these tensor-product features was effectively solved by a tensor network reparameterization of the model parameters. However, another important aspect of model training - identifying optimal feature hyperparameters - has not been addressed and is typically handled using the standard cross-validation approach. In this paper, we introduce the Feature Learning (FL) model, which addresses this issue by representing tensor-product features as a learnable Canonical Polyadic Decomposition (CPD). By leveraging this CPD structure, we efficiently learn the hyperparameters associated with different features alongside the model parameters using an Alternating Least Squares (ALS) optimization method. We prove the effectiveness of the FL model through experiments on real data of various dimensionality and scale. The results show that the FL model can be consistently trained 3-5 times faster than and have the prediction quality on par with a standard cross-validated model.

MLDec 2, 2025
Laplace Approximation For Tensor Train Kernel Machines In System Identification

Albert Saiapin, Kim Batselier

To address the scalability limitations of Gaussian process (GP) regression, several approximation techniques have been proposed. One such method is based on tensor networks, which utilizes an exponential number of basis functions without incurring exponential computational cost. However, extending this model to a fully probabilistic formulation introduces several design challenges. In particular, for tensor train (TT) models, it is unclear which TT-core should be treated in a Bayesian manner. We introduce a Bayesian tensor train kernel machine that applies Laplace approximation to estimate the posterior distribution over a selected TT-core and employs variational inference (VI) for precision hyperparameters. Experiments show that core selection is largely independent of TT-ranks and feature structure, and that VI replaces cross-validation while offering up to 65x faster training. The method's effectiveness is demonstrated on an inverse dynamics problem.

MLNov 25, 2025
A Fully Probabilistic Tensor Network for Regularized Volterra System Identification

Afra Kilic, Kim Batselier

Modeling nonlinear systems with Volterra series is challenging because the number of kernel coefficients grows exponentially with the model order. This work introduces Bayesian Tensor Network Volterra kernel machines (BTN-V), extending the Bayesian Tensor Network framework to Volterra system identification. BTN-V represents Volterra kernels using canonical polyadic decomposition, reducing model complexity from O(I^D) to O(DIR). By treating all tensor components and hyperparameters as random variables, BTN-V provides predictive uncertainty estimation at no additional computational cost. Sparsity-inducing hierarchical priors enable automatic rank determination and the learning of fading-memory behavior directly from data, improving interpretability and preventing overfitting. Empirical results demonstrate competitive accuracy, enhanced uncertainty quantification, and reduced computational cost.

MLJul 15, 2025
Interpretable Bayesian Tensor Network Kernel Machines with Automatic Rank and Feature Selection

Afra Kilic, Kim Batselier

Tensor Network (TN) Kernel Machines speed up model learning by representing parameters as low-rank TNs, reducing computation and memory use. However, most TN-based Kernel methods are deterministic and ignore parameter uncertainty. Further, they require manual tuning of model complexity hyperparameters like tensor rank and feature dimensions, often through trial-and-error or computationally costly methods like cross-validation. We propose Bayesian Tensor Network Kernel Machines, a fully probabilistic framework that uses sparsity-inducing hierarchical priors on TN factors to automatically infer model complexity. This enables automatic inference of tensor rank and feature dimensions, while also identifying the most relevant features for prediction, thereby enhancing model interpretability. All the model parameters and hyperparameters are treated as latent variables with corresponding priors. Given the Bayesian approach and latent variable dependencies, we apply a mean-field variational inference to approximate their posteriors. We show that applying a mean-field approximation to TN factors yields a Bayesian ALS algorithm with the same computational complexity as its deterministic counterpart, enabling uncertainty quantification at no extra computational cost. Experiments on synthetic and real-world datasets demonstrate the superior performance of our model in prediction accuracy, uncertainty quantification, interpretability, and scalability.

LGOct 14, 2024
A Kernelizable Primal-Dual Formulation of the Multilinear Singular Value Decomposition

Frederiek Wesel, Kim Batselier

The ability to express a learning task in terms of a primal and a dual optimization problem lies at the core of a plethora of machine learning methods. For example, Support Vector Machine (SVM), Least-Squares Support Vector Machine (LS-SVM), Ridge Regression (RR), Lasso Regression (LR), Principal Component Analysis (PCA), and more recently Singular Value Decomposition (SVD) have all been defined either in terms of primal weights or in terms of dual Lagrange multipliers. The primal formulation is computationally advantageous in the case of large sample size while the dual is preferred for high-dimensional data. Crucially, said learning problems can be made nonlinear through the introduction of a feature map in the primal problem, which corresponds to applying the kernel trick in the dual. In this paper we derive a primal-dual formulation of the Multilinear Singular Value Decomposition (MLSVD), which recovers as special cases both PCA and SVD. Besides enabling computational gains through the derived primal formulation, we propose a nonlinear extension of the MLSVD using feature maps, which results in a dual problem where a kernel tensor arises. We discuss potential applications in the context of signal analysis and deep learning.

NAJun 25, 2024
Constructing structured tensor priors for Bayesian inverse problems

Kim Batselier

Specifying a prior distribution is an essential part of solving Bayesian inverse problems. The prior encodes a belief on the nature of the solution and this regularizes the problem. In this article we completely characterize a Gaussian prior that encodes the belief that the solution is a structured tensor. We first define the notion of (A,b)-constrained tensors and show that they describe a large variety of different structures such as Hankel, circulant, triangular, symmetric, and so on. Then we completely characterize the Gaussian probability distribution of such tensors by specifying its mean vector and covariance matrix. Furthermore, explicit expressions are proved for the covariance matrix of tensors whose entries are invariant under a permutation. These results unlock a whole new class of priors for Bayesian inverse problems. We illustrate how new kernel functions can be designed and efficiently computed and apply our results on two particular Bayesian inverse problems: completing a Hankel matrix from a few noisy measurements and learning an image classifier of handwritten digits. The effectiveness of the proposed priors is demonstrated for both problems. All applications have been implemented as reactive Pluto notebooks in Julia.

LGMar 28, 2024
Tensor Network-Constrained Kernel Machines as Gaussian Processes

Frederiek Wesel, Kim Batselier

Tensor Networks (TNs) have recently been used to speed up kernel machines by constraining the model weights, yielding exponential computational and storage savings. In this paper we prove that the outputs of Canonical Polyadic Decomposition (CPD) and Tensor Train (TT)-constrained kernel machines recover a Gaussian Process (GP), which we fully characterize, when placing i.i.d. priors over their parameters. We analyze the convergence of both CPD and TT-constrained models, and show how TT yields models exhibiting more GP behavior compared to CPD, for the same number of model parameters. We empirically observe this behavior in two numerical experiments where we respectively analyze the convergence to the GP and the performance at prediction. We thereby establish a connection between TN-constrained kernel machines and GPs.

LGMay 9, 2023
How Informative is the Approximation Error from Tensor Decomposition for Neural Network Compression?

Jetze T. Schuurmans, Kim Batselier, Julian F. P. Kooij

Tensor decompositions have been successfully applied to compress neural networks. The compression algorithms using tensor decompositions commonly minimize the approximation error on the weights. Recent work assumes the approximation error on the weights is a proxy for the performance of the model to compress multiple layers and fine-tune the compressed model. Surprisingly, little research has systematically evaluated which approximation errors can be used to make choices regarding the layer, tensor decomposition method, and level of compression. To close this gap, we perform an experimental study to test if this assumption holds across different layers and types of decompositions, and what the effect of fine-tuning is. We include the approximation error on the features resulting from a compressed layer in our analysis to test if this provides a better proxy, as it explicitly takes the data into account. We find the approximation error on the weights has a positive correlation with the performance error, before as well as after fine-tuning. Basing the approximation error on the features does not improve the correlation significantly. While scaling the approximation error commonly is used to account for the different sizes of layers, the average correlation across layers is smaller than across all choices (i.e. layers, decompositions, and level of compression) before fine-tuning. When calculating the correlation across the different decompositions, the average rank correlation is larger than across all choices. This means multiple decompositions can be considered for compression and the approximation error can be used to choose between them.

LGOct 26, 2021
Tensor Network Kalman Filtering for Large-Scale LS-SVMs

Maximilian Lucassen, Johan A. K. Suykens, Kim Batselier

Least squares support vector machines are a commonly used supervised learning method for nonlinear regression and classification. They can be implemented in either their primal or dual form. The latter requires solving a linear system, which can be advantageous as an explicit mapping of the data to a possibly infinite-dimensional feature space is avoided. However, for large-scale applications, current low-rank approximation methods can perform inadequately. For example, current methods are probabilistic due to their sampling procedures, and/or suffer from a poor trade-off between the ranks and approximation power. In this paper, a recursive Bayesian filtering framework based on tensor networks and the Kalman filter is presented to alleviate the demanding memory and computational complexities associated with solving large-scale dual problems. The proposed method is iterative, does not require explicit storage of the kernel matrix, and allows the formulation of early stopping conditions. Additionally, the framework yields confidence estimates of obtained models, unlike alternative methods. The performance is tested on two regression and three classification experiments, and compared to the Nyström and fixed size LS-SVM methods. Results show that our method can achieve high performance and is particularly useful when alternative methods are computationally infeasible due to a slowly decaying kernel matrix spectrum.

LGSep 3, 2021
Large-Scale Learning with Fourier Features and Tensor Decompositions

Frederiek Wesel, Kim Batselier

Random Fourier features provide a way to tackle large-scale machine learning problems with kernel methods. Their slow Monte Carlo convergence rate has motivated the research of deterministic Fourier features whose approximation error can decrease exponentially in the number of basis functions. However, due to their tensor product extension to multiple dimensions, these methods suffer heavily from the curse of dimensionality, limiting their applicability to one, two or three-dimensional scenarios. In our approach we overcome said curse of dimensionality by exploiting the tensor product structure of deterministic Fourier features, which enables us to represent the model parameters as a low-rank tensor decomposition. We derive a monotonically converging block coordinate descent algorithm with linear complexity in both the sample size and the dimensionality of the inputs for a regularized squared loss function, allowing to learn a parsimonious model in decomposed form using deterministic Fourier features. We demonstrate by means of numerical experiments how our low-rank tensor approach obtains the same performance of the corresponding nonparametric model, consistently outperforming random Fourier features.

LGDec 21, 2020
Alternating linear scheme in a Bayesian framework for low-rank tensor approximation

Clara Menzen, Manon Kok, Kim Batselier

Multiway data often naturally occurs in a tensorial format which can be approximately represented by a low-rank tensor decomposition. This is useful because complexity can be significantly reduced and the treatment of large-scale data sets can be facilitated. In this paper, we find a low-rank representation for a given tensor by solving a Bayesian inference problem. This is achieved by dividing the overall inference problem into sub-problems where we sequentially infer the posterior distribution of one tensor decomposition component at a time. This leads to a probabilistic interpretation of the well-known iterative algorithm alternating linear scheme (ALS). In this way, the consideration of measurement noise is enabled, as well as the incorporation of application-specific prior knowledge and the uncertainty quantification of the low-rank tensor estimate. To compute the low-rank tensor estimate from the posterior distributions of the tensor decomposition components, we present an algorithm that performs the unscented transform in tensor train format.

LGJan 2, 2020
Kernelized Support Tensor Train Machines

Cong Chen, Kim Batselier, Wenjian Yu et al.

Tensor, a multi-dimensional data structure, has been exploited recently in the machine learning community. Traditional machine learning approaches are vector- or matrix-based, and cannot handle tensorial data directly. In this paper, we propose a tensor train (TT)-based kernel technique for the first time, and apply it to the conventional support vector machine (SVM) for image classification. Specifically, we propose a kernelized support tensor train machine that accepts tensorial input and preserves the intrinsic kernel property. The main contributions are threefold. First, we propose a TT-based feature mapping procedure that maintains the TT structure in the feature space. Second, we demonstrate two ways to construct the TT-based kernel function while considering consistency with the TT inner product and preservation of information. Third, we show that it is possible to apply different kernel functions on different data modes. In principle, our method tensorizes the standard SVM on its input structure and kernel mapping scheme. Extensive experiments are performed on real-world tensor data, which demonstrates the superiority of the proposed scheme under few-sample high-dimensional inputs.

CVNov 12, 2018
Matrix Product Operator Restricted Boltzmann Machines

Cong Chen, Kim Batselier, Ching-Yun Ko et al.

A restricted Boltzmann machine (RBM) learns a probability distribution over its input samples and has numerous uses like dimensionality reduction, classification and generative modeling. Conventional RBMs accept vectorized data that dismisses potentially important structural information in the original tensor (multi-way) input. Matrix-variate and tensor-variate RBMs, named MvRBM and TvRBM, have been proposed but are all restrictive by model construction, which leads to a weak model expression power. This work presents the matrix product operator RBM (MPORBM) that utilizes a tensor network generalization of Mv/TvRBM, preserves input formats in both the visible and hidden layers, and results in higher expressive power. A novel training algorithm integrating contrastive divergence and an alternating optimization procedure is also developed. Numerical experiments compare the MPORBM with the traditional RBM and MvRBM for data classification and image completion and denoising tasks. The expressive power of the MPORBM as a function of the MPO-rank is also investigated.

LGNov 9, 2018
Deep Compression of Sum-Product Networks on Tensor Networks

Ching-Yun Ko, Cong Chen, Yuke Zhang et al.

Sum-product networks (SPNs) represent an emerging class of neural networks with clear probabilistic semantics and superior inference speed over graphical models. This work reveals a strikingly intimate connection between SPNs and tensor networks, thus leading to a highly efficient representation that we call tensor SPNs (tSPNs). For the first time, through mapping an SPN onto a tSPN and employing novel optimization techniques, we demonstrate remarkable parameter compression with negligible loss in accuracy.

NAApr 17, 2018
Fast and Accurate Tensor Completion with Total Variation Regularized Tensor Trains

Ching-Yun Ko, Kim Batselier, Wenjian Yu et al.

We propose a new tensor completion method based on tensor trains. The to-be-completed tensor is modeled as a low-rank tensor train, where we use the known tensor entries and their coordinates to update the tensor train. A novel tensor train initialization procedure is proposed specifically for image and video completion, which is demonstrated to ensure fast convergence of the completion algorithm. The tensor train framework is also shown to easily accommodate Total Variation and Tikhonov regularization due to their low-rank tensor train representations. Image and video inpainting experiments verify the superiority of the proposed scheme in terms of both speed and scalability, where a speedup of up to 155X is observed compared to state-of-the-art tensor completion methods at a similar accuracy. Moreover, we demonstrate the proposed scheme is especially advantageous over existing algorithms when only tiny portions (say, 1%) of the to-be-completed images/videos are known.

LGApr 17, 2018
A Support Tensor Train Machine

Cong Chen, Kim Batselier, Ching-Yun Ko et al.

There has been growing interest in extending traditional vector-based machine learning techniques to their tensor forms. An example is the support tensor machine (STM) that utilizes a rank-one tensor to capture the data structure, thereby alleviating the overfitting and curse of dimensionality problems in the conventional support vector machine (SVM). However, the expressive power of a rank-one tensor is restrictive for many real-world data. To overcome this limitation, we introduce a support tensor train machine (STTM) by replacing the rank-one tensor in an STM with a tensor train. Experiments validate and confirm the superiority of an STTM over the SVM and STM.

SYAug 17, 2017
Matrix output extension of the tensor network Kalman filter with an application in MIMO Volterra system identification

Kim Batselier, Ngai Wong

This article extends the tensor network Kalman filter to matrix outputs with an application in recursive identification of discrete-time nonlinear multiple-input-multiple-output (MIMO) Volterra systems. This extension completely supersedes previous work, where only $l$ scalar outputs were considered. The Kalman tensor equations are modified to accommodate for matrix outputs and their implementation using tensor networks is discussed. The MIMO Volterra system identification application requires the conversion of the output model matrix with a row-wise Kronecker product structure into its corresponding tensor network, for which we propose an efficient algorithm. Numerical experiments demonstrate both the efficacy of the proposed matrix conversion algorithm and the improved convergence of the Volterra kernel estimates when using matrix outputs.

LGDec 20, 2016
Parallelized Tensor Train Learning of Polynomial Classifiers

Zhongming Chen, Kim Batselier, Johan A. K. Suykens et al.

In pattern classification, polynomial classifiers are well-studied methods as they are capable of generating complex decision surfaces. Unfortunately, the use of multivariate polynomials is limited to kernels as in support vector machines, because polynomials quickly become impractical for high-dimensional problems. In this paper, we effectively overcome the curse of dimensionality by employing the tensor train format to represent a polynomial classifier. Based on the structure of tensor trains, two learning algorithms are proposed which involve solving different optimization problems of low computational complexity. Furthermore, we show how both regularization to prevent overfitting and parallelization, which enables the use of large training sets, are incorporated into these methods. Both the efficiency and efficacy of our tensor-based polynomial classifier are then demonstrated on the two popular datasets USPS and MNIST.