Kazuma Tsuji

NAMay 17, 2021

Sparse solutions of the kernel herding algorithm by improved gradient approximation

Kazuma Tsuji, Ken'ichiro Tanaka

The kernel herding algorithm is used to construct quadrature rules in a reproducing kernel Hilbert space (RKHS). While the computational efficiency of the algorithm and stability of the output quadrature formulas are advantages of this method, the convergence speed of the integration error for a given number of nodes is slow compared to that of other quadrature methods. In this paper, we propose a modified kernel herding algorithm whose framework was introduced in a previous study and aim to obtain sparser solutions while preserving the advantages of standard kernel herding. In the proposed algorithm, the negative gradient is approximated by several vertex directions, and the current solution is updated by moving in the approximate descent direction in each iteration. We show that the convergence speed of the integration error is directly determined by the cosine of the angle between the negative gradient and approximate gradient. Based on this, we propose new gradient approximation algorithms and analyze them theoretically, including through convergence analysis. In numerical experiments, we confirm the effectiveness of the proposed algorithms in terms of sparsity of nodes and computational efficiency. Moreover, we provide a new theoretical analysis of the kernel quadrature rules with fully-corrective weights, which realizes faster convergence speeds than those of previous studies.

MLSep 23, 2020

Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space

Kazuma Tsuji, Taiji Suzuki

Deep learning has achieved notable success in various fields, including image and speech recognition. One of the factors in the successful performance of deep learning is its high feature extraction ability. In this study, we focus on the adaptivity of deep learning; consequently, we treat the variable exponent Besov space, which has a different smoothness depending on the input location $x$. In other words, the difficulty of the estimation is not uniform within the domain. We analyze the general approximation error of the variable exponent Besov space and the approximation and estimation errors of deep learning. We note that the improvement based on adaptivity is remarkable when the region upon which the target function has less smoothness is small and the dimension is large. Moreover, the superiority to linear estimators is shown with respect to the convergence rate of the estimation error.

Kazuma Tsuji

2 Papers