Soma Yokoi

3papers

6citations

Novelty52%

AI Score25

Ranked #172,535 of 205,806 authors (top 84%)#2,804 in ML (top 80%)

3 Papers

MLJun 18, 2024

Top-Down Bayesian Posterior Sampling for Sum-Product Networks

Soma Yokoi, Issei Sato

Sum-product networks (SPNs) are probabilistic models characterized by exact and fast evaluation of fundamental probabilistic operations. Its superior computational tractability has led to applications in many fields, such as machine learning with time constraints or accuracy requirements and real-time systems. The structural constraints of SPNs supporting fast inference, however, lead to increased learning-time complexity and can be an obstacle to building highly expressive SPNs. This study aimed to develop a Bayesian learning approach that can be efficiently implemented on large-scale SPNs. We derived a new full conditional probability of Gibbs sampling by marginalizing multiple random variables to expeditiously obtain the posterior distribution. The complexity analysis revealed that our sampling algorithm works efficiently even for the largest possible SPN. Furthermore, we proposed a hyperparameter tuning method that balances the diversity of the prior distribution and optimization efficiency in large-scale SPNs. Our method has improved learning-time complexity and demonstrated computational speed tens to more than one hundred times faster and superior predictive performance in numerical experiments on more than 20 datasets.

MLNov 20, 2019

Bayesian interpretation of SGD as Ito process

Soma Yokoi, Issei Sato

The current interpretation of stochastic gradient descent (SGD) as a stochastic process lacks generality in that its numerical scheme restricts continuous-time dynamics as well as the loss function and the distribution of gradient noise. We introduce a simplified scheme with milder conditions that flexibly interprets SGD as a discrete-time approximation of an Ito process. The scheme also works as a common foundation of SGD and stochastic gradient Langevin dynamics (SGLD), providing insights into their asymptotic properties. We investigate the convergence of SGD with biased gradient in terms of the equilibrium mode and the overestimation problem of the second moment of SGLD.

MLMar 7, 2019

On Transformations in Stochastic Gradient MCMC

Soma Yokoi, Takuma Otsuka, Issei Sato

Stochastic gradient Langevin dynamics (SGLD) is a computationally efficient sampler for Bayesian posterior inference given a large scale dataset. Although SGLD is designed for unbounded random variables, many practical models incorporate variables with boundaries such as non-negative ones or those in a finite interval. To bridge this gap, we consider mapping unbounded samples into the target interval. This paper reveals that several mapping approaches commonly used in the literature produces erroneous samples from theoretical and empirical perspectives. We show that the change of random variable using an invertible Lipschitz mapping function overcomes the pitfall as well as attains the weak convergence. Experiments demonstrate its efficacy for widely-used models with bounded latent variables including Bayesian non-negative matrix factorization and binary neural networks.