Keisuke Yamazaki

ML
10papers
129citations
Novelty42%
AI Score22

10 Papers

MLJun 22, 2019
Model Bridging: Connection between Simulation Model and Neural Network

Keiichi Kisamori, Keisuke Yamazaki, Yuto Komori et al.

The interpretability of machine learning, particularly for deep neural networks, is crucial for decision making in real-world applications. One approach is replacing the un-interpretable machine learning model with a surrogate model, which has a simple structure for interpretation. Another approach is understanding the target system by using a simulation modeled by human knowledge with interpretable simulation parameters. Recently, simulator calibration has been developed based on kernel mean embedding to estimate the simulation parameters as posterior distributions. Our idea is to use a simulation model as an interpretable surrogate model. However, the computational cost of simulator calibration is high owing to the complexity of the simulation model. Thus, we propose a ''model-bridging'' framework to bridge machine learning models with simulation models by a series of kernel mean embeddings to address these difficulties. The proposed framework enables us to obtain predictions and interpretable simulation parameters simultaneously without the computationally expensive calculations of the simulations. In this study, we apply the proposed framework to essential simulations in the manufacturing industry, such as production simulation and fluid dynamics simulation.

MLSep 21, 2018
Simulator Calibration under Covariate Shift with Kernels

Keiichi Kisamori, Motonobu Kanagawa, Keisuke Yamazaki

We propose a novel calibration method for computer simulators, dealing with the problem of covariate shift. Covariate shift is the situation where input distributions for training and test are different, and ubiquitous in applications of simulations. Our approach is based on Bayesian inference with kernel mean embedding of distributions, and on the use of an importance-weighted reproducing kernel for covariate shift adaptation. We provide a theoretical analysis for the proposed method, including a novel theoretical result for conditional mean embedding, as well as empirical investigations suggesting its effectiveness in practice. The experiments include calibration of a widely used simulator for industrial manufacturing processes, where we also demonstrate how the proposed method may be useful for sensitivity analysis of model parameters.

MLFeb 23, 2018
Kernel Recursive ABC: Point Estimation with Intractable Likelihood

Takafumi Kajihara, Motonobu Kanagawa, Keisuke Yamazaki et al.

We propose a novel approach to parameter estimation for simulator-based statistical models with intractable likelihood. Our proposed method involves recursive application of kernel ABC and kernel herding to the same observed data. We provide a theoretical explanation regarding why the approach works, showing (for the population setting) that, under a certain assumption, point estimates obtained with this method converge to the true parameter, as recursion proceeds. We have conducted a variety of numerical experiments, including parameter estimation for a real-world pedestrian flow simulator, and show that in most cases our method outperforms existing approaches.

MLJul 13, 2016
Effects of Additional Data on Bayesian Clustering

Keisuke Yamazaki

Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity.

MLOct 5, 2015
Bayesian Estimation of Multidimensional Latent Variables and Its Asymptotic Accuracy

Keisuke Yamazaki

Hierarchical learning models, such as mixture models and Bayesian networks, are widely employed for unsupervised learning tasks, such as clustering analysis. They consist of observable and hidden variables, which represent the given data and their hidden generation process, respectively. It has been pointed out that conventional statistical analysis is not applicable to these models, because redundancy of the latent variable produces singularities in the parameter space. In recent years, a method based on algebraic geometry has allowed us to analyze the accuracy of predicting observable variables when using Bayesian estimation. However, how to analyze latent variables has not been sufficiently studied, even though one of the main issues in unsupervised learning is to determine how accurately the latent variable is estimated. A previous study proposed a method that can be used when the range of the latent variable is redundant compared with the model generating data. The present paper extends that method to the situation in which the latent variables have redundant dimensions. We formulate new error functions and derive their asymptotic forms. Calculation of the error functions is demonstrated in two-layered Bayesian networks.

MLAug 25, 2014
Asymptotic Accuracy of Bayesian Estimation for a Single Latent Variable

Keisuke Yamazaki

In data science and machine learning, hierarchical parametric models, such as mixture models, are often used. They contain two kinds of variables: observable variables, which represent the parts of the data that can be directly measured, and latent variables, which represent the underlying processes that generate the data. Although there has been an increase in research on the estimation accuracy for observable variables, the theoretical analysis of estimating latent variables has not been thoroughly investigated. In a previous study, we determined the accuracy of a Bayes estimation for the joint probability of the latent variables in a dataset, and we proved that the Bayes method is asymptotically more accurate than the maximum-likelihood method. However, the accuracy of the Bayes estimation for a single latent variable remains unknown. In the present paper, we derive the asymptotic expansions of the error functions, which are defined by the Kullback-Leibler divergence, for two types of single-variable estimations when the statistical regularity is satisfied. Our results indicate that the accuracies of the Bayes and maximum-likelihood methods are asymptotically equivalent and clarify that the Bayes method is only advantageous for multivariable estimations.

MLAug 9, 2013
Accuracy of Latent-Variable Estimation in Bayesian Semi-Supervised Learning

Keisuke Yamazaki

Hierarchical probabilistic models, such as Gaussian mixture models, are widely used for unsupervised learning tasks. These models consist of observable and latent variables, which represent the observable data and the underlying data-generation process, respectively. Unsupervised learning tasks, such as cluster analysis, are regarded as estimations of latent variables based on the observable ones. The estimation of latent variables in semi-supervised learning, where some labels are observed, will be more precise than that in unsupervised, and one of the concerns is to clarify the effect of the labeled data. However, there has not been sufficient theoretical analysis of the accuracy of the estimation of latent variables. In a previous study, a distribution-based error function was formulated, and its asymptotic form was calculated for unsupervised learning with generative models. It has been shown that, for the estimation of latent variables, the Bayes method is more accurate than the maximum-likelihood method. The present paper reveals the asymptotic forms of the error function in Bayesian semi-supervised learning for both discriminative and generative models. The results show that the generative model, which uses all of the given data, performs better when the model is well specified.

LGOct 19, 2012
Stochastic complexity of Bayesian networks

Keisuke Yamazaki, Sumio Watanbe

Bayesian networks are now being used in enormous fields, for example, diagnosis of a system, data mining, clustering and so on. In spite of their wide range of applications, the statistical properties have not yet been clarified, because the models are nonidentifiable and non-regular. In a Bayesian network, the set of its parameter for a smaller model is an analytic set with singularities in the space of large ones. Because of these singularities, the Fisher information matrices are not positive definite. In other words, the mathematical foundation for learning was not constructed. In recent years, however, we have developed a method to analyze non-regular models using algebraic geometry. This method revealed the relation between the models singularities and its statistical properties. In this paper, applying this method to Bayesian networks with latent variables, we clarify the order of the stochastic complexities.Our result claims that the upper bound of those is smaller than the dimension of the parameter space. This means that the Bayesian generalization error is also far smaller than that of regular model, and that Schwarzs model selection criterion BIC needs to be improved for Bayesian networks.

MLMay 15, 2012
Asymptotic Accuracy of Bayes Estimation for Latent Variables with Redundancy

Keisuke Yamazaki

Hierarchical parametric models consisting of observable and latent variables are widely used for unsupervised learning tasks. For example, a mixture model is a representative hierarchical model for clustering. From the statistical point of view, the models can be regular or singular due to the distribution of data. In the regular case, the models have the identifiability; there is one-to-one relation between a probability density function for the model expression and the parameter. The Fisher information matrix is positive definite, and the estimation accuracy of both observable and latent variables has been studied. In the singular case, on the other hand, the models are not identifiable and the Fisher matrix is not positive definite. Conventional statistical analysis based on the inverse Fisher matrix is not applicable. Recently, an algebraic geometrical analysis has been developed and is used to elucidate the Bayes estimation of observable variables. The present paper applies this analysis to latent-variable estimation and determines its theoretical performance. Our results clarify behavior of the convergence of the posterior distribution. It is found that the posterior of the observable-variable estimation can be different from the one in the latent-variable estimation. Because of the difference, the Markov chain Monte Carlo method based on the parameter and the latent variable cannot construct the desired posterior distribution.

MLApr 10, 2012
Asymptotic Accuracy of Distribution-Based Estimation for Latent Variables

Keisuke Yamazaki

Hierarchical statistical models are widely employed in information science and data engineering. The models consist of two types of variables: observable variables that represent the given data and latent variables for the unobservable labels. An asymptotic analysis of the models plays an important role in evaluating the learning process; the result of the analysis is applied not only to theoretical but also to practical situations, such as optimal model selection and active learning. There are many studies of generalization errors, which measure the prediction accuracy of the observable variables. However, the accuracy of estimating the latent variables has not yet been elucidated. For a quantitative evaluation of this, the present paper formulates distribution-based functions for the errors in the estimation of the latent variables. The asymptotic behavior is analyzed for both the maximum likelihood and the Bayes methods.