Tomoyuki Kubota

h-index11

3papers

68citations

Novelty43%

AI Score34

Ranked #112,792 of 194,257 authors (top 58%)#24,816 in LG (top 62%)

3 Papers

7.8NESep 5, 2024

How noise affects memory in linear recurrent networks

JingChuan Guan, Tomoyuki Kubota, Yasuo Kuniyoshi et al.

The effects of noise on memory in a linear recurrent network are theoretically investigated. Memory is characterized by its ability to store previous inputs in its instantaneous state of network, which receives a correlated or uncorrelated noise. Two major properties are revealed: First, the memory reduced by noise is uniquely determined by the noise's power spectral density (PSD). Second, the memory will not decrease regardless of noise intensity if the PSD is in a certain class of distribution (including power law). The results are verified using the human brain signals, showing good agreement.

4.1LGOct 1, 2025

Memory Determines Learning Direction: A Theory of Gradient-Based Optimization in State Space Models

JingChuan Guan, Tomoyuki Kubota, Yasuo Kuniyoshi et al.

State space models (SSMs) have gained attention by showing potential to outperform Transformers. However, previous studies have not sufficiently addressed the mechanisms underlying their high performance owing to a lack of theoretical explanation of SSMs' learning dynamics. In this study, we provide such an explanation and propose an improved training strategy. The memory capacity of SSMs can be evaluated by examining how input time series are stored in their current state. Such an examination reveals a tradeoff between memory accuracy and length, as well as the theoretical equivalence between the structured state space sequence model (S4) and a simplified S4 with diagonal recurrent weights. This theoretical foundation allows us to elucidate the learning dynamics, proving the importance of initial parameters. Our analytical results suggest that successful learning requires the initial memory structure to be the longest possible even if memory accuracy may deteriorate or the gradient lose the teacher information. Experiments on tasks requiring long memory confirmed that extending memory is difficult, emphasizing the importance of initialization. Furthermore, we found that fixing recurrent weights can be more advantageous than adapting them because it achieves comparable or even higher performance with faster convergence. Our results provide a new theoretical foundation for SSMs and potentially offer a novel optimization strategy.

11.1LGJun 11, 2019

A Unifying Framework for Information Processing in Stochastically Driven Dynamical Systems

Tomoyuki Kubota, Hirokazu Takahashi, Kohei Nakajima

A dynamical system can be regarded as an information processing apparatus that encodes input streams from the external environment to its state and processes them through state transitions. The information processing capacity (IPC) is an excellent tool that comprehensively evaluates these processed inputs, providing details of unknown information processing in black box systems; however, this measure can be applied to only time-invariant systems. This paper extends the applicable range to time-variant systems and further reveals that the IPC is equivalent to coefficients of polynomial chaos (PC) expansion in more general dynamical systems. To achieve this objective, we tackle three issues. First, we establish a connection between the IPC for time-invariant systems and PC expansion, which is a type of polynomial expansion using orthogonal functions of input history as bases. We prove that the IPC corresponds to the squared norm of the coefficient vector of the basis in the PC expansion. Second, we show that an input following an arbitrary distribution can be used for the IPC, removing previous restrictions to specific input distributions. Third, we extend the conventional orthogonal bases to functions of both time and input history and propose the IPC for time-variant systems. To show the significance of our approach, we demonstrate that our measure can reveal information representations in not only machine learning networks but also a real, cultured neural network. Our generalized measure paves the way for unveiling the information processing capabilities of a wide variety of physical dynamics which has been left behind in nature.