LGFeb 12, 2023
Koopman-based generalization bound: New aspect for full-rank weightsYuka Hashimoto, Sho Sonoda, Isao Ishikawa et al.
We propose a new bound for generalization of neural networks using Koopman operators. Whereas most of existing works focus on low-rank weight matrices, we focus on full-rank weight matrices. Our bound is tighter than existing norm-based bounds when the condition numbers of weight matrices are small. Especially, it is completely independent of the width of the network if the weight matrices are orthogonal. Our bound does not contradict to the existing bounds but is a complement to the existing bounds. As supported by several existing empirical results, low-rankness is not the only reason for generalization. Furthermore, our bound can be combined with the existing bounds to obtain a tighter bound. Our result sheds new light on understanding generalization of neural networks with full-rank weight matrices, and it provides a connection between operator-theoretic analysis and generalization of neural networks.
MLOct 21, 2022
Learning in RKHM: a $C^*$-Algebraic Twist for Kernel MachinesYuka Hashimoto, Masahiro Ikeda, Hachem Kadri
Supervised learning in reproducing kernel Hilbert space (RKHS) and vector-valued RKHS (vvRKHS) has been investigated for more than 30 years. In this paper, we provide a new twist to this rich literature by generalizing supervised learning in RKHS and vvRKHS to reproducing kernel Hilbert $C^*$-module (RKHM), and show how to construct effective positive-definite kernels by considering the perspective of $C^*$-algebra. Unlike the cases of RKHS and vvRKHS, we can use $C^*$-algebras to enlarge representation spaces. This enables us to construct RKHMs whose representation power goes beyond RKHSs, vvRKHSs, and existing methods such as convolutional neural networks. Our framework is suitable, for example, for effectively analyzing image data by allowing the interaction of Fourier components.
MLJun 20, 2022
$C^*$-algebra Net: A New Approach Generalizing Neural Network Parameters to $C^*$-algebraYuka Hashimoto, Zhao Wang, Tomoko Matsui
We propose a new framework that generalizes the parameters of neural network models to $C^*$-algebra-valued ones. $C^*$-algebra is a generalization of the space of complex numbers. A typical example is the space of continuous functions on a compact space. This generalization enables us to combine multiple models continuously and use tools for functions such as regression and integration. Consequently, we can learn features of data efficiently and adapt the models to problems continuously. We apply our framework to practical problems such as density estimation and few-shot learning and show that our framework enables us to learn features of data even with a limited number of samples. Our new framework highlights the potential possibility of applying the theory of $C^*$-algebra to general neural network models.
OAJan 26, 2023
Noncommutative $C^*$-algebra Net: Learning Neural Networks with Powerful Product Structure in $C^*$-algebraRyuichiro Hataya, Yuka Hashimoto
We propose a new generalization of neural network parameter spaces with noncommutative $C^*$-algebra, which possesses a rich noncommutative structure of products. We show that this noncommutative structure induces powerful effects in learning neural networks. Our framework has a wide range of applications, such as learning multiple related neural networks simultaneously with interactions and learning equivariant features with respect to group actions. Numerical experiments illustrate the validity of our framework and its potential power.
LGOct 5, 2023
Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep NetworksSho Sonoda, Yuka Hashimoto, Isao Ishikawa et al.
We identify hidden layers inside a deep neural network (DNN) with group actions on the data domain, and formulate a formal deep network as a dual voice transform with respect to the Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of DNNs.
24.0LGMay 13
Unified generalization analysis for physics informed neural networksYuka Hashimoto, Tomoharu Iwata
Physics-Informed Neural Networks (PINNs) and their variational counterparts (VPINNs) are neural networks that incorporate physical laws, making them useful for scientific problems. Existing generalization analyses for PINNs and VPINNs remain limited, often requiring restrictive assumptions such as stability conditions or linear ellipticity. In this paper, we derive generalization bounds for neural networks that involve differentiation with respect to input variables, covering PINNs and VPINNs under a unified framework. We apply Taylor expansion to represent nonlinear differential operators as linear operators on a high-dimensional space, enabling the use of Koopman-based analysis and showing that high-rank networks can generalize well even in settings involving differential operators. We also show that the nonlinearity of the differential operator exponentially enlarges the bound, highlighting its significant impact on generalization.
DSMar 4, 2024
Koopman operators with intrinsic observables in rigged reproducing kernel Hilbert spacesIsao Ishikawa, Yuka Hashimoto, Masahiro Ikeda et al.
This paper presents a novel approach for estimating the Koopman operator defined on a reproducing kernel Hilbert space (RKHS) and its spectra. We propose an estimation method, what we call Jet Dynamic Mode Decomposition (JetDMD), leveraging the intrinsic structure of RKHS and the geometric notion known as jets to enhance the estimation of the Koopman operator. This method refines the traditional Extended Dynamic Mode Decomposition (EDMD) in accuracy, especially in the numerical estimation of eigenvalues. This paper proves JetDMD's superiority through explicit error bounds and convergence rate for special positive definite kernels, offering a solid theoretical foundation for its performance. We also delve into the spectral analysis of the Koopman operator, proposing the notion of extended Koopman operator within a framework of rigged Hilbert space. This notion leads to a deeper understanding of estimated Koopman eigenfunctions and capturing them outside the original function space. Through the theory of rigged Hilbert space, our study provides a principled methodology to analyze the estimated spectrum and eigenfunctions of Koopman operators, and enables eigendecomposition within a rigged RKHS. We also propose a new effective method for reconstructing the dynamical system from temporally-sampled trajectory data of the dynamical system with solid theoretical guarantee. We conduct several numerical simulations using the van der Pol oscillator, the Duffing oscillator, the Hénon map, and the Lorenz attractor, and illustrate the performance of JetDMD with clear numerical computations of eigenvalues and accurate predictions of the dynamical systems.
LGFeb 4, 2024
$C^*$-Algebraic Machine Learning: Moving in a New DirectionYuka Hashimoto, Masahiro Ikeda, Hachem Kadri
Machine learning has a long collaborative tradition with several fields of mathematics, such as statistics, probability and linear algebra. We propose a new direction for machine learning research: $C^*$-algebraic ML $-$ a cross-fertilization between $C^*$-algebra and machine learning. The mathematical concept of $C^*$-algebra is a natural generalization of the space of complex numbers. It enables us to unify existing learning strategies, and construct a new framework for more diverse and information-rich data models. We explain why and how to use $C^*$-algebras in machine learning, and provide technical considerations that go into the design of $C^*$-algebraic learning models in the contexts of kernel methods and neural networks. Furthermore, we discuss open questions and challenges in $C^*$-algebraic ML and give our thoughts for future development and applications.
LGMay 22, 2024
Deep Ridgelet Transform and Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant MachinesSho Sonoda, Yuka Hashimoto, Isao Ishikawa et al.
We present a constructive universal approximation theorem for learning machines equipped with joint-group-equivariant feature maps, called the joint-equivariant machines, based on the group representation theory. ``Constructive'' here indicates that the distribution of parameters is given in a closed-form expression known as the ridgelet transform. Joint-group-equivariance encompasses a broad class of feature maps that generalize classical group-equivariance. Particularly, fully-connected networks are not group-equivariant but are joint-group-equivariant. Our main theorem also unifies the universal approximation theorems for both shallow and deep networks. Until this study, the universality of deep networks has been shown in a different manner from the universality of shallow networks, but our results discuss them on common ground. Now we can understand the approximation schemes of various learning machines in a unified manner. As applications, we show the constructive universal approximation properties of four examples: depth-$n$ joint-equivariant machine, depth-$n$ fully-connected network, depth-$n$ group-convolutional network, and a new depth-$2$ network with quadratic forms whose universality has not been known.
LGSep 26, 2025
Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSsYuka Hashimoto, Sho Sonoda, Isao Ishikawa et al.
We derive a new Rademacher complexity bound for deep neural networks using Koopman operators, group representations, and reproducing kernel Hilbert spaces (RKHSs). The proposed bound describes why the models with high-rank weight matrices generalize well. Although there are existing bounds that attempt to describe this phenomenon, these existing bounds can be applied to limited types of models. We introduce an algebraic representation of neural networks and a kernel function to construct an RKHS to derive a bound for a wider range of realistic models. This work paves the way for the Koopman-based theory for Rademacher complexity bounds to be valid for more practical situations.
QUANT-PHJun 4, 2025
Towards Quantum Operator-Valued KernelsHachem Kadri, Joachim Tomasi, Yuka Hashimoto et al.
Quantum kernels are reproducing kernel functions built using quantum-mechanical principles and are studied with the aim of outperforming their classical counterparts. The enthusiasm for quantum kernel machines has been tempered by recent studies that have suggested that quantum kernels could not offer speed-ups when learning on classical data. However, most of the research in this area has been devoted to scalar-valued kernels in standard classification or regression settings for which classical kernel methods are efficient and effective, leaving very little room for improvement with quantum kernels. This position paper argues that quantum kernel research should focus on more expressive kernel classes. We build upon recent advances in operator-valued kernels, and propose guidelines for investigating quantum kernels. This should help to design a new generation of quantum kernel machines and fully explore their potentials.
LGMay 21, 2025
Why and When Deep is Better than Shallow: An Implementation-Agnostic State-Transition View of Depth SupremacySho Sonoda, Yuka Hashimoto, Isao Ishikawa et al.
Why and when is deep better than shallow? We answer this question in a framework that is agnostic to network implementation. We formulate a deep model as an abstract state-transition semigroup acting on a general metric space, and separate the implementation (e.g., ReLU nets, transformers, and chain-of-thought) from the abstract state transition. We prove a bias-variance decomposition in which the variance depends only on the abstract depth-$k$ network and not on the implementation (Theorem 1). We further split the bounds into output and hidden parts to tie the depth dependence of the variance to the metric entropy of the state-transition semigroup (Theorem 2). We then investigate implementation-free conditions under which the variance grow polynomially or logarithmically with depth (Section 4). Combining these with exponential or polynomial bias decay identifies four canonical bias-variance trade-off regimes (EL/EP/PL/PP) and produces explicit optimal depths $k^\ast$. Across regimes, $k^\ast>1$ typically holds, giving a rigorous form of depth supremacy. The lowest generalization error bound is achieved under the EL regime (exp-decay bias + log-growth variance), explaining why and when deep is better, especially for iterative or hierarchical concept classes such as neural ODEs, diffusion/score-matching models, and chain-of-thought reasoning.
LGApr 9, 2024
Quantum Circuit $C^*$-algebra NetYuka Hashimoto, Ryuichiro Hataya
This paper introduces quantum circuit $C^*$-algebra net, which provides a connection between $C^*$-algebra nets proposed in classical machine learning and quantum circuits. Using $C^*$-algebra, a generalization of the space of complex numbers, we can represent quantum gates as weight parameters of a neural network. By introducing additional parameters, we can induce interaction among multiple circuits constructed by quantum gates. This interaction enables the circuits to share information among them, which contributes to improved generalization performance in machine learning tasks. As an application, we propose to use the quantum circuit $C^*$-algebra net to encode classical data into quantum states, which enables us to integrate classical data into quantum algorithms. Numerical results demonstrate that the interaction among circuits improves performance significantly in image classification, and encoded data by the quantum circuit $C^*$-algebra net are useful for downstream quantum machine learning tasks.
MLMay 23, 2023
Deep Learning with Kernels through RKHM and the Perron-Frobenius OperatorYuka Hashimoto, Masahiro Ikeda, Hachem Kadri
Reproducing kernel Hilbert $C^*$-module (RKHM) is a generalization of reproducing kernel Hilbert space (RKHS) by means of $C^*$-algebra, and the Perron-Frobenius operator is a linear operator related to the composition of functions. Combining these two concepts, we present deep RKHM, a deep learning framework for kernel methods. We derive a new Rademacher generalization bound in this setting and provide a theoretical interpretation of benign overfitting by means of Perron-Frobenius operators. By virtue of $C^*$-algebra, the dependency of the bound on output dimension is milder than existing bounds. We show that $C^*$-algebra is a suitable tool for deep learning with kernels, enabling us to take advantage of the product structure of operators and to provide a clear connection with convolutional neural networks. Our theoretical analysis provides a new lens through which one can design and analyze deep kernel methods.
MLJan 27, 2021
Reproducing kernel Hilbert C*-module and kernel mean embeddingsYuka Hashimoto, Isao Ishikawa, Masahiro Ikeda et al.
Kernel methods have been among the most popular techniques in machine learning, where learning tasks are solved using the property of reproducing kernel Hilbert space (RKHS). In this paper, we propose a novel data analysis framework with reproducing kernel Hilbert $C^*$-module (RKHM) and kernel mean embedding (KME) in RKHM. Since RKHM contains richer information than RKHS or vector-valued RKHS (vvRKHS), analysis with RKHM enables us to capture and extract structural properties in such as functional data. We show a branch of theories for RKHM to apply to data analysis, including the representer theorem, and the injectivity and universality of the proposed KME. We also show RKHM generalizes RKHS and vvRKHS. Then, we provide concrete procedures for employing RKHM and the proposed KME to data analysis.
MLJul 29, 2020
Kernel Mean Embeddings of Von Neumann-Algebra-Valued MeasuresYuka Hashimoto, Isao Ishikawa, Masahiro Ikeda et al.
Kernel mean embedding (KME) is a powerful tool to analyze probability measures for data, where the measures are conventionally embedded into a reproducing kernel Hilbert space (RKHS). In this paper, we generalize KME to that of von Neumann-algebra-valued measures into reproducing kernel Hilbert modules (RKHMs), which provides an inner product and distance between von Neumann-algebra-valued measures. Von Neumann-algebra-valued measures can, for example, encode relations between arbitrary pairs of variables in a multivariate distribution or positive operator-valued measures for quantum mechanics. Thus, this allows us to perform probabilistic analyses explicitly reflected with higher-order interactions among variables, and provides a way of applying machine learning frameworks to problems in quantum mechanics. We also show that the injectivity of the existing KME and the universality of RKHS are generalized to RKHM, which confirms many useful features of the existing KME remain in our generalized KME. And, we investigate the empirical performance of our methods using synthetic and real-world data.
MLMar 2, 2020
Analysis via Orthonormal Systems in Reproducing Kernel Hilbert $C^*$-Modules and ApplicationsYuka Hashimoto, Isao Ishikawa, Masahiro Ikeda et al.
Kernel methods have been among the most popular techniques in machine learning, where learning tasks are solved using the property of reproducing kernel Hilbert space (RKHS). In this paper, we propose a novel data analysis framework with reproducing kernel Hilbert $C^*$-module (RKHM), which is another generalization of RKHS than vector-valued RKHS (vv-RKHS). Analysis with RKHMs enables us to deal with structures among variables more explicitly than vv-RKHS. We show the theoretical validity for the construction of orthonormal systems in Hilbert $C^*$-modules, and derive concrete procedures for orthonormalization in RKHMs with those theoretical properties in numerical computations. Moreover, we apply those to generalize with RKHM kernel principal component analysis and the analysis of dynamical systems with Perron-Frobenius operators. The empirical performance of our methods is also investigated by using synthetic and real-world data.
LGSep 9, 2019
Krylov Subspace Method for Nonlinear Dynamical Systems with Random NoiseYuka Hashimoto, Isao Ishikawa, Masahiro Ikeda et al.
Operator-theoretic analysis of nonlinear dynamical systems has attracted much attention in a variety of engineering and scientific fields, endowed with practical estimation methods using data such as dynamic mode decomposition. In this paper, we address a lifted representation of nonlinear dynamical systems with random noise based on transfer operators, and develop a novel Krylov subspace method for estimating the operators using finite data, with consideration of the unboundedness of operators. For this purpose, we first consider Perron-Frobenius operators with kernel-mean embeddings for such systems. We then extend the Arnoldi method, which is the most classical type of Kryov subspace method, so that it can be applied to the current case. Meanwhile, the Arnoldi method requires the assumption that the operator is bounded, which is not necessarily satisfied for transfer operators on nonlinear systems. We accordingly develop the shift-invert Arnoldi method for Perron-Frobenius operators to avoid this problem. Also, we describe an approach of evaluating predictive accuracy by estimated operators on the basis of the maximum mean discrepancy, which is applicable, for example, to anomaly detection in complex systems. The empirical performance of our methods is investigated using synthetic and real-world healthcare data.
MLMay 31, 2018
Metric on Nonlinear Dynamical Systems with Perron-Frobenius OperatorsIsao Ishikawa, Keisuke Fujii, Masahiro Ikeda et al.
The development of a metric for structural data is a long-term problem in pattern recognition and machine learning. In this paper, we develop a general metric for comparing nonlinear dynamical systems that is defined with Perron-Frobenius operators in reproducing kernel Hilbert spaces. Our metric includes the existing fundamental metrics for dynamical systems, which are basically defined with principal angles between some appropriately-chosen subspaces, as its special cases. We also describe the estimation of our metric from finite data. We empirically illustrate our metric with an example of rotation dynamics in a unit disk in a complex plane, and evaluate the performance with real-world time-series data.