LG MLJan 15, 2024

Efficient Nonparametric Tensor Decomposition for Binary and Count Data

arXiv:2401.07711v17.97 citationsh-index: 5Has CodeAAAI

Originality Highly original

AI Analysis

This work addresses the challenge of analyzing discrete data in high-dimensional tensors for applications like recommendation systems or event modeling, offering a more effective method than existing approaches.

The paper tackled the problem of tensor decomposition for binary and count data, which traditional Gaussian-based methods handle poorly, by proposing ENTED, a nonparametric approach using Gaussian processes and variational inference, resulting in better performance and computational advantages on real-world tensor completion tasks.

In numerous applications, binary reactions or event counts are observed and stored within high-order tensors. Tensor decompositions (TDs) serve as a powerful tool to handle such high-dimensional and sparse data. However, many traditional TDs are explicitly or implicitly designed based on the Gaussian distribution, which is unsuitable for discrete data. Moreover, most TDs rely on predefined multi-linear structures, such as CP and Tucker formats. Therefore, they may not be effective enough to handle complex real-world datasets. To address these issues, we propose ENTED, an \underline{E}fficient \underline{N}onparametric \underline{TE}nsor \underline{D}ecomposition for binary and count tensors. Specifically, we first employ a nonparametric Gaussian process (GP) to replace traditional multi-linear structures. Next, we utilize the \pg augmentation which provides a unified framework to establish conjugate models for binary and count distributions. Finally, to address the computational issue of GPs, we enhance the model by incorporating sparse orthogonal variational inference of inducing points, which offers a more effective covariance approximation within GPs and stochastic natural gradient updates for nonparametric models. We evaluate our model on several real-world tensor completion tasks, considering binary and count datasets. The results manifest both better performance and computational advantages of the proposed model.

View on arXiv PDF Code

Similar