Naoki Saito

CV
h-index26
14papers
88citations
Novelty49%
AI Score39

14 Papers

NAAug 21, 2012
Mysteries around the graph Laplacian eigenvalue 4

Yuji Nakatsukasa, Naoki Saito, Ernest Woei

We describe our current understanding on the phase transition phenomenon of the graph Laplacian eigenvectors constructed on a certain type of unweighted trees, which we previously observed through our numerical experiments. The eigenvalue distribution for such a tree is a smooth bell-shaped curve starting from the eigenvalue 0 up to 4. Then, at the eigenvalue 4, there is a sudden jump. Interestingly, the eigenvectors corresponding to the eigenvalues below 4 are semi-global oscillations (like Fourier modes) over the entire tree or one of the branches; on the other hand, those corresponding to the eigenvalues above 4 are much more localized and concentrated (like wavelets) around junctions/branching vertices. For a special class of trees called starlike trees, we obtain a complete understanding of such phase transition phenomenon. For a general graph, we prove the number of the eigenvalues larger than 4 is bounded from above by the number of vertices whose degrees is strictly higher than 2. Moreover, we also prove that if a graph contains a branching path, then the magnitudes of the components of any eigenvector corresponding to the eigenvalue greater than 4 decay exponentially from the branching vertex toward the leaf of that branch.

SPSep 26, 2013
On Rayleigh-Type Formulas for a Non-local Boundary Value Problem Associated with an Integral Operator Commuting with the Laplacian

Lotfi Hermi, Naoki Saito

In this article we prove the existence, uniqueness, and simplicity of a negative eigenvalue for a class of integral operators whose kernel is of the form $|x-y|^ρ$, $0 < ρ\leq 1$, $x, y \in [-a, a]$. We also provide two different ways of producing recursive formulas for the Rayleigh functions (i.e., recursion formulas for power sums) of the eigenvalues of this integral operator when $ρ=1$, providing means of approximating this negative eigenvalue. These methods offer recursive procedures for dealing with the eigenvalues of a one-dimensional Laplacian with non-local boundary conditions which commutes with an integral operator having a harmonic kernel. The problem emerged in recent work by one of the authors [45]. We also discuss extensions in higher dimensions and links with distance matrices.

LGSep 23, 2025
Multiscale Hodge Scattering Networks for Data Analysis

Naoki Saito, Stefan C. Schonsheck, Eugene Shvarts

We propose new scattering networks for signals measured on simplicial complexes, which we call \emph{Multiscale Hodge Scattering Networks} (MHSNs). Our construction builds on multiscale basis dictionaries on simplicial complexes -- namely, the $κ$-GHWT and $κ$-HGLET -- which we recently developed for simplices of dimension $κ\in \mathbb{N}$ in a given simplicial complex by generalizing the node-based Generalized Haar--Walsh Transform (GHWT) and Hierarchical Graph Laplacian Eigen Transform (HGLET). Both the $κ$-GHWT and the $κ$-HGLET form redundant sets (i.e., dictionaries) of multiscale basis vectors and the corresponding expansion coefficients of a given signal. Our MHSNs adopt a layered structure analogous to a convolutional neural network (CNN), cascading the moments of the modulus of the dictionary coefficients. The resulting features are invariant to reordering of the simplices (i.e., node permutation of the underlying graphs). Importantly, the use of multiscale basis dictionaries in our MHSNs admits a natural pooling operation -- akin to local pooling in CNNs -- that can be performed either locally or per scale. Such pooling operations are more difficult to define in traditional scattering networks based on Morlet wavelets and in geometric scattering networks based on Diffusion Wavelets. As a result, our approach extracts a rich set of descriptive yet robust features that can be combined with simple machine learning models (e.g., logistic regression or support vector machines) to achieve high-accuracy classification with far fewer trainable parameters than most modern graph neural networks require. Finally, we demonstrate the effectiveness of MHSNs on three distinct problem types: signal classification, domain (i.e., graph/simplex) classification, and molecular dynamics prediction.

LGNov 17, 2023
Multiscale Hodge Scattering Networks for Data Analysis

Naoki Saito, Stefan C. Schonsheck, Eugene Shvarts

We propose new scattering networks for signals measured on simplicial complexes, which we call \emph{Multiscale Hodge Scattering Networks} (MHSNs). Our construction builds on multiscale basis dictionaries on simplicial complexes -- namely, the $κ$-GHWT and $κ$-HGLET -- which we recently developed for simplices of dimension $κ\in \mathbb{N}$ in a given simplicial complex by generalizing the node-based Generalized Haar--Walsh Transform (GHWT) and Hierarchical Graph Laplacian Eigen Transform (HGLET). Both the $κ$-GHWT and the $κ$-HGLET form redundant sets (i.e., dictionaries) of multiscale basis vectors and the corresponding expansion coefficients of a given signal. Our MHSNs adopt a layered structure analogous to a convolutional neural network (CNN), cascading the moments of the modulus of the dictionary coefficients. The resulting features are invariant to reordering of the simplices (i.e., node permutation of the underlying graphs). Importantly, the use of multiscale basis dictionaries in our MHSNs admits a natural pooling operation -- akin to local pooling in CNNs -- that can be performed either locally or per scale. Such pooling operations are more difficult to define in traditional scattering networks based on Morlet wavelets and in geometric scattering networks based on Diffusion Wavelets. As a result, our approach extracts a rich set of descriptive yet robust features that can be combined with simple machine learning models (e.g., logistic regression or support vector machines) to achieve high-accuracy classification with far fewer trainable parameters than most modern graph neural networks require. Finally, we demonstrate the effectiveness of MHSNs on three distinct problem types: signal classification, domain (i.e., graph/simplex) classification, and molecular dynamics prediction.

ASJun 16, 2022
The Scattering Transform Network with Generalized Morse Wavelets and Its Application to Music Genre Classification

Wai Ho Chak, Naoki Saito, David Weber

We propose to use the Generalized Morse Wavelets (GMWs) instead of commonly-used Morlet (or Gabor) wavelets in the Scattering Transform Network (STN), which we call the GMW-STN, for signal classification problems. The GMWs form a parameterized family of truly analytic wavelets while the Morlet wavelets are only approximately analytic. The analyticity of underlying wavelet filters in the STN is particularly important for nonstationary oscillatory signals such as music signals because it improves interpretability of the STN representations by providing multiscale amplitude and phase (and consequently frequency) information of input signals. We demonstrate the superiority of the GMW-STN over the conventional STN in music genre classification using the so-called GTZAN database. Moreover, we show the performance improvement of the GMW-STN by increasing its number of layers to three over the typical two-layer STN.}

LGFeb 8, 2025
Explainable and Class-Revealing Signal Feature Extraction via Scattering Transform and Constrained Zeroth-Order Optimization

Naoki Saito, David Weber

We propose a new method to extract discriminant and explainable features from a particular machine learning model, i.e., a combination of the scattering transform and the multiclass logistic regression. Although this model is well-known for its ability to learn various signal classes with high classification rate, it remains elusive to understand why it can generate such successful classification, mainly due to the nonlinearity of the scattering transform. In order to uncover the meaning of the scattering transform coefficients selected by the multiclass logistic regression (with the Lasso penalty), we adopt zeroth-order optimization algorithms to search an input pattern that maximizes the class probability of a class of interest given the learned model. In order to do so, it turns out that imposing sparsity and smoothness of input patterns is important. We demonstrate the effectiveness of our proposed method using a couple of synthetic time-series classification problems.

CLAug 30, 2025
Discrete Prompt Tuning via Recursive Utilization of Black-box Multimodal Large Language Model for Personalized Visual Emotion Recognition

Ryo Takahashi, Naoki Saito, Keisuke Maeda et al.

Visual Emotion Recognition (VER) is an important research topic due to its wide range of applications, including opinion mining and advertisement design. Extending this capability to recognize emotions at the individual level further broadens its potential applications. Recently, Multimodal Large Language Models (MLLMs) have attracted increasing attention and demonstrated performance comparable to that of conventional VER methods. However, MLLMs are trained on large and diverse datasets containing general opinions, which causes them to favor majority viewpoints and familiar patterns. This tendency limits their performance in a personalized VER, which is crucial for practical and real-world applications, and indicates a key area for improvement. To address this limitation, the proposed method employs discrete prompt tuning inspired by the process of humans' prompt engineering to adapt the VER task to each individual. Our method selects the best natural language representation from the generated prompts and uses it to update the prompt for the realization of accurate personalized VER.

CVJan 22, 2025
Triplet Synthesis For Enhancing Composed Image Retrieval via Counterfactual Image Generation

Kenta Uesugi, Naoki Saito, Keisuke Maeda et al.

Composed Image Retrieval (CIR) provides an effective way to manage and access large-scale visual data. Construction of the CIR model utilizes triplets that consist of a reference image, modification text describing desired changes, and a target image that reflects these changes. For effectively training CIR models, extensive manual annotation to construct high-quality training datasets, which can be time-consuming and labor-intensive, is required. To deal with this problem, this paper proposes a novel triplet synthesis method by leveraging counterfactual image generation. By controlling visual feature modifications via counterfactual image generation, our approach automatically generates diverse training triplets without any manual intervention. This approach facilitates the creation of larger and more expressive datasets, leading to the improvement of CIR model's performance.

IVFeb 25, 2022
Monogenic Wavelet Scattering Network for Texture Image Classification

Wai Ho Chak, Naoki Saito

The scattering transform network (STN), which has a similar structure as that of a popular convolutional neural network except its use of predefined convolution filters and a small number of layers, can generates a robust representation of an input signal relative to small deformations. We propose a novel Monogenic Wavelet Scattering Network (MWSN) for 2D texture image classification through a cascade of monogenic wavelet filtering with nonlinear modulus and averaging operators by replacing the 2D Morlet wavelet filtering in the standard STN. Our MWSN can extract useful hierarchical and directional features with interpretable coefficients, which can be further compressed by PCA and fed into a classifier. Using the CUReT texture image database, we demonstrate the superior performance of our MWSN over the standard STN. This performance improvement can be explained by the natural extension of 1D analyticity to 2D monogenicity.

SPJul 11, 2021
eGHWT: The Extended Generalized Haar-Walsh Transform

Naoki Saito, Yiqun Shao

Extending computational harmonic analysis tools from the classical setting of regular lattices to the more general setting of graphs and networks is very important and much research has been done recently. The Generalized Haar-Walsh Transform (GHWT) developed by Irion and Saito (2014) is a multiscale transform for signals on graphs, which is a generalization of the classical Haar and Walsh-Hadamard Transforms. We propose the extended Generalized Haar-Walsh Transform (eGHWT), which is a generalization of the adapted time-frequency tilings of Thiele and Villemoes (1996). The eGHWT examines not only the efficiency of graph-domain partitions but also that of "sequency-domain" partitions simultaneously. Consequently, the eGHWT and its associated best-basis selection algorithm for graph signals significantly improve the performance of the previous GHWT with the similar computational cost, $O(N \log N)$, where $N$ is the number of nodes of an input graph. While the GHWT best-basis algorithm seeks the most suitable orthonormal basis for a given task among more than $(1.5)^N$ possible orthonormal bases in $\mathbb{R}^N$, the eGHWT best-basis algorithm can find a better one by searching through more than $0.618\cdot(1.84)^N$ possible orthonormal bases in $\mathbb{R}^N$. This article describes the details of the eGHWT best-basis algorithm and demonstrates its superiority using several examples including genuine graph signals as well as conventional digital images viewed as graph signals. Furthermore, we also show how the eGHWT can be extended to 2D signals and matrix-form data by viewing them as a tensor product of graphs generated from their columns and rows and demonstrate its effectiveness on applications such as image approximation.

CVJan 9, 2019
The Use of Mutual Coherence to Prove $\ell^1/\ell^0$-Equivalence in Classification Problems

Chelsea Weaver, Naoki Saito

We consider the decomposition of a signal over an overcomplete set of vectors. Minimization of the $\ell^1$-norm of the coefficient vector can often retrieve the sparsest solution (so-called "$\ell^1/\ell^0$-equivalence"), a generally NP-hard task, and this fact has powered the field of compressed sensing. Wright et al.'s sparse representation-based classification (SRC) applies this relationship to machine learning, wherein the signal to be decomposed represents the test sample and columns of the dictionary are training samples. We investigate the relationships between $\ell^1$-minimization, sparsity, and classification accuracy in SRC. After proving that the tractable, deterministic approach to verifying $\ell^1/\ell^0$-equivalence fundamentally conflicts with the high coherence between same-class training samples, we demonstrate that $\ell^1$-minimization can still recover the sparsest solution when the classes are well-separated. Further, using a nonlinear transform so that sparse recovery conditions may be satisfied, we demonstrate that approximate (not strict) equivalence is key to the success of SRC.

NAJul 11, 2017
Adaptive synchrosqueezing based on a quilted short-time Fourier transform

Alexander Berrian, Naoki Saito

In recent years, the synchrosqueezing transform (SST) has gained popularity as a method for the analysis of signals that can be broken down into multiple components determined by instantaneous amplitudes and phases. One such version of SST, based on the short-time Fourier transform (STFT), enables the sharpening of instantaneous frequency (IF) information derived from the STFT, as well as the separation of amplitude-phase components corresponding to distinct IF curves. However, this SST is limited by the time-frequency resolution of the underlying window function, and may not resolve signals exhibiting diverse time-frequency behaviors with sufficient accuracy. In this work, we develop a framework for an SST based on a "quilted" short-time Fourier transform (SST-QSTFT), which allows adaptation to signal behavior in separate time-frequency regions through the use of multiple windows. This motivates us to introduce a discrete reassignment frequency formula based on a finite difference of the phase spectrum, ensuring computational accuracy for a wider variety of windows. We develop a theoretical framework for the SST-QSTFT in both the continuous and the discrete settings, and describe an algorithm for the automatic selection of optimal windows depending on the region of interest. Using synthetic data, we demonstrate the superior numerical performance of SST-QSTFT relative to other SST methods in a noisy context. Finally, we apply SST-QSTFT to audio recordings of animal calls to demonstrate the potential of our method for the analysis of real bioacoustic signals.

CVJul 11, 2017
Underwater object classification using scattering transform of sonar signals

Naoki Saito, David S. Weber

In this paper, we apply the scattering transform (ST), a nonlinear map based off of a convolutional neural network (CNN), to classification of underwater objects using sonar signals. The ST formalizes the observation that the filters learned by a CNN have wavelet like structure. We achieve effective binary classification both on a real dataset of Unexploded Ordinance (UXOs), as well as synthetically generated examples. We also explore the effects on the waveforms with respect to changes in the object domain (e.g., translation, rotation, and acoustic impedance, etc.), and examine the consequences coming from theoretical results for the scattering transform. We show that the scattering transform is capable of excellent classification on both the synthetic and real problems, thanks to having more quasi-invariance properties that are well-suited to translation and rotation of the object.

CVJul 4, 2016
Improving Sparse Representation-Based Classification Using Local Principal Component Analysis

Chelsea Weaver, Naoki Saito

Sparse representation-based classification (SRC), proposed by Wright et al., seeks the sparsest decomposition of a test sample over the dictionary of training samples, with classification to the most-contributing class. Because it assumes test samples can be written as linear combinations of their same-class training samples, the success of SRC depends on the size and representativeness of the training set. Our proposed classification algorithm enlarges the training set by using local principal component analysis to approximate the basis vectors of the tangent hyperplane of the class manifold at each training sample. The dictionary in SRC is replaced by a local dictionary that adapts to the test sample and includes training samples and their corresponding tangent basis vectors. We use a synthetic data set and three face databases to demonstrate that this method can achieve higher classification accuracy than SRC in cases of sparse sampling, nonlinear class manifolds, and stringent dimension reduction.