Ayaka Sakata

ML
h-index5
16papers
188citations
Novelty52%
AI Score51

16 Papers

MLJun 26, 2022
Prediction Errors for Penalized Regressions based on Generalized Approximate Message Passing

Ayaka Sakata

We discuss the prediction accuracy of assumed statistical models in terms of prediction errors for the generalized linear model and penalized maximum likelihood methods. We derive the forms of estimators for the prediction errors, such as $C_p$ criterion, information criteria, and leave-one-out cross validation (LOOCV) error, using the generalized approximate message passing (GAMP) algorithm and replica method. These estimators coincide with each other when the number of model parameters is sufficiently small; however, there is a discrepancy between them in particular in the parameter region where the number of model parameters is larger than the data dimension. In this paper, we review the prediction errors and corresponding estimators, and discuss their differences. In the framework of GAMP, we show that the information criteria can be expressed by using the variance of the estimates. Further, we demonstrate how to approach LOOCV error from the information criteria by utilizing the expression provided by GAMP.

9.2MLMay 2
Stabilizing Private LASSO under Heterogeneous Covariates via Anisotropic Objective Perturbation

Haruka Tanzawa, Ayaka Sakata

We study high-dimensional LASSO under differential privacy via objective perturbation with heterogeneous covariate scales. In practical scenarios, covariates often exhibit diverse scales; however, standard preprocessing is problematic under privacy constraints, as it consumes additional privacy budget. This heterogeneity induces effective anisotropy in the objective perturbation via the inverse Gram matrix of covariates, which can degrade the stability and accuracy of algorithms. To address this, we propose a Gram-based anisotropic objective perturbation, a ``pre-distortion" strategy that counteracts the distortion from the covariate structure to restore isotropy in the estimation process. Using an Approximate Message Passing (AMP) framework and state evolution analysis, we demonstrate that our proposed perturbation significantly stabilizes convergence and improves both statistical efficiency and privacy performance compared to standard uniform noise injection. Our results provide theoretical insights into designing stable and efficient private estimators without relying on data-dependent preprocessing.

5.5MLMar 27
Privacy-Accuracy Trade-offs in High-Dimensional LASSO under Perturbation Mechanisms

Ayaka Sakata, Haruka Tanzawa

We study privacy-preserving sparse linear regression in the high-dimensional regime, focusing on the LASSO estimator. We analyze two widely used mechanisms for differential privacy: output perturbation, which injects noise into the estimator, and objective perturbation, which adds a random linear term to the loss function. Using approximate message passing (AMP), we characterize the typical behavior of these estimators under random design and privacy noise. To quantify privacy, we adopt typical-case measures, including the on-average KL divergence, which admits a hypothesis-testing interpretation in terms of distinguishability between neighboring datasets. Our analysis reveals that sparsity plays a central role in shaping the privacy-accuracy trade-off: stronger regularization can improve privacy by stabilizing the estimator against single-point data changes. We further show that the two mechanisms exhibit qualitatively different behaviors. In particular, for objective perturbation, increasing the noise level can have non-monotonic effects, and excessive noise may destabilize the estimator, leading to increased sensitivity to data perturbations. Our results demonstrate that AMP provides a powerful framework for analyzing privacy-accuracy trade-offs in high-dimensional sparse models.

MLDec 19, 2025
Perfect reconstruction of sparse signals using nonconvexity control and one-step RSB message passing

Xiaosi Gu, Ayaka Sakata, Tomoyuki Obuchi

We consider sparse signal reconstruction via minimization of the smoothly clipped absolute deviation (SCAD) penalty, and develop one-step replica-symmetry-breaking (1RSB) extensions of approximate message passing (AMP), termed 1RSB-AMP. Starting from the 1RSB formulation of belief propagation, we derive explicit update rules of 1RSB-AMP together with the corresponding state evolution (1RSB-SE) equations. A detailed comparison shows that 1RSB-AMP and 1RSB-SE agree remarkably well at the macroscopic level, even in parameter regions where replica-symmetric (RS) AMP, termed RS-AMP, diverges and where the 1RSB description itself is not expected to be thermodynamically exact. Fixed-point analysis of 1RSB-SE reveals a phase diagram consisting of success, failure, and diverging phases, as in the RS case. However, the diverging-region boundary now depends on the Parisi parameter due to the 1RSB ansatz, and we propose a new criterion -- minimizing the size of the diverging region -- rather than the conventional zero-complexity condition, to determine its value. Combining this criterion with the nonconvexity-control (NCC) protocol proposed in a previous RS study improves the algorithmic limit of perfect reconstruction compared with RS-AMP. Numerical solutions of 1RSB-SE and experiments with 1RSB-AMP confirm that this improved limit is achieved in practice, though the gain is modest and remains slightly inferior to the Bayes-optimal threshold. We also report the behavior of thermodynamic quantities -- overlaps, free entropy, complexity, and the non-self-averaging susceptibility -- that characterize the 1RSB phase in this problem.

MLJan 30, 2024
Effect of Weight Quantization on Learning Models by Typical Case Analysis

Shuhei Kashiwamura, Ayaka Sakata, Masaaki Imaizumi

This paper examines the quantization methods used in large-scale data analysis models and their hyperparameter choices. The recent surge in data analysis scale has significantly increased computational resource requirements. To address this, quantizing model weights has become a prevalent practice in data analysis applications such as deep learning. Quantization is particularly vital for deploying large models on devices with limited computational resources. However, the selection of quantization hyperparameters, like the number of bits and value range for weight quantization, remains an underexplored area. In this study, we employ the typical case analysis from statistical physics, specifically the replica method, to explore the impact of hyperparameters on the quantization of simple learning models. Our analysis yields three key findings: (i) an unstable hyperparameter phase, known as replica symmetry breaking, occurs with a small number of bits and a large quantization width; (ii) there is an optimal quantization width that minimizes error; and (iii) quantization delays the onset of overparameterization, helping to mitigate overfitting as indicated by the double descent phenomenon. We also discover that non-uniform quantization can enhance stability. Additionally, we develop an approximate message-passing algorithm to validate our theoretical results.

MLJan 27, 2025
The Effect of Optimal Self-Distillation in Noisy Gaussian Mixture Model

Kaito Takanami, Takashi Takahashi, Ayaka Sakata

Self-distillation (SD), a technique where a model improves itself using its own predictions, has attracted attention as a simple yet powerful approach in machine learning. Despite its widespread use, the mechanisms underlying its effectiveness remain unclear. In this study, we investigate the efficacy of hyperparameter-tuned multi-stage SD with a linear classifier for binary classification on noisy Gaussian mixture data. For the analysis, we employ the replica method from statistical physics. Our findings reveal that the primary driver of SD's performance improvement is denoising through hard pseudo-labels, with the most notable gains observed in moderately sized datasets. We also identify two practical heuristics to enhance SD: early stopping that limits the number of stages, which is broadly effective, and bias parameter fixing, which helps under label imbalance. To empirically validate our theoretical findings derived from our toy model, we conduct additional experiments on CIFAR-10 classification using pretrained ResNet backbone. These results provide both theoretical and practical insights, advancing our understanding and application of SD in noisy settings.

MLOct 12, 2025
High-Dimensional Learning Dynamics of Quantized Models with Straight-Through Estimator

Yuma Ichikawa, Shuhei Kashiwamura, Ayaka Sakata

Quantized neural network training optimizes a discrete, non-differentiable objective. The straight-through estimator (STE) enables backpropagation through surrogate gradients and is widely used. While previous studies have primarily focused on the properties of surrogate gradients and their convergence, the influence of quantization hyperparameters, such as bit width and quantization range, on learning dynamics remains largely unexplored. We theoretically show that in the high-dimensional limit, STE dynamics converge to a deterministic ordinary differential equation. This reveals that STE training exhibits a plateau followed by a sharp drop in generalization error, with plateau length depending on the quantization range. A fixed-point analysis quantifies the asymptotic deviation from the unquantized linear model. We also extend analytical techniques for stochastic gradient descent to nonlinear transformations of weights and inputs.

MLJul 27, 2020
Active pooling design in group testing based on Bayesian posterior prediction

Ayaka Sakata

In identifying infected patients in a population, group testing is an effective method to reduce the number of tests and correct the test errors. In the group testing procedure, tests are performed on pools of specimens collected from patients, where the number of pools is lower than that of patients. The performance of group testing heavily depends on the design of pools and algorithms that are used in inferring the infected patients from the test outcomes. In this paper, an adaptive design method of pools based on the predictive distribution is proposed in the framework of Bayesian inference. The proposed method executed using the belief propagation algorithm results in more accurate identification of the infected patients, as compared to the group testing performed on random pools determined in advance.

MLApr 28, 2020
Bayesian inference of infected patients in group testing with prevalence estimation

Ayaka Sakata

Group testing is a method of identifying infected patients by performing tests on a pool of specimens collected from patients. For the case in which the test returns a false result with finite probability, we propose Bayesian inference and a corresponding belief propagation (BP) algorithm to identify the infected patients from the results of tests performed on the pool. We show that the true-positive rate is improved by taking into account the credible interval of a point estimate of each patient. Further, the prevalence and the error probability in the test are estimated by combining an expectation-maximization method with the BP algorithm. As another approach, we introduce a hierarchical Bayes model to identify the infected patients and estimate the prevalence. By comparing these methods, we formulate a guide for practical usage.

MLFeb 27, 2019
Cross validation in sparse linear regression with piecewise continuous nonconvex penalties and its acceleration

Tomoyuki Obuchi, Ayaka Sakata

We investigate the signal reconstruction performance of sparse linear regression in the presence of noise when piecewise continuous nonconvex penalties are used. Among such penalties, we focus on the SCAD penalty. The contributions of this study are three-fold: We first present a theoretical analysis of a typical reconstruction performance, using the replica method, under the assumption that each component of the design matrix is given as an independent and identically distributed (i.i.d.) Gaussian variable. This clarifies the superiority of the SCAD estimator compared with $\ell_1$ in a wide parameter range, although the nonconvex nature of the penalty tends to lead to solution multiplicity in certain regions. This multiplicity is shown to be connected to replica symmetry breaking in the spin-glass theory. We also show that the global minimum of the mean square error between the estimator and the true signal is located in the replica symmetric phase. Second, we develop an approximate formula efficiently computing the cross-validation error without actually conducting the cross-validation, which is also applicable to the non-i.i.d. design matrices. It is shown that this formula is only applicable to the unique solution region and tends to be unstable in the multiple solution region. We implement instability detection procedures, which allows the approximate formula to stand alone and resultantly enables us to draw phase diagrams for any specific dataset. Third, we propose an annealing procedure, called nonconvexity annealing, to obtain the solution path efficiently. Numerical simulations are conducted on simulated datasets to examine these results to verify the theoretical results consistency and the approximate formula efficiency. Another numerical experiment on a real-world dataset is conducted; its results are consistent with those of earlier studies using the $\ell_0$ formulation.

MLFeb 20, 2019
Perfect reconstruction of sparse signals with piecewise continuous nonconvex penalties and nonconvexity control

Ayaka Sakata, Tomoyuki Obuchi

We consider compressed sensing formulated as a minimization problem of nonconvex sparse penalties, Smoothly Clipped Absolute deviation (SCAD) and Minimax Concave Penalty (MCP). The nonconvexity of these penalties is controlled by nonconvexity parameters, and L1 penalty is contained as a limit with respect to these parameters. The analytically derived reconstruction limit overcomes that of L1 and the algorithmic limit in the Bayes-optimal setting, when the nonconvexity parameters have suitable values. However, for small nonconvexity parameters, where the reconstruction of the relatively dense signals is theoretically guaranteed, the corresponding approximate message passing (AMP) cannot achieve perfect reconstruction. We identify that the shrinks in the basin of attraction to the perfect reconstruction causes the discrepancy between the AMP and corresponding theory using state evolution. A part of the discrepancy is resolved by introducing the control of the nonconvexity parameters to guide the AMP trajectory to the basin of the attraction.

MLFeb 20, 2018
Estimator of Prediction Error Based on Approximate Message Passing for Penalized Linear Regression

Ayaka Sakata

We propose an estimator of prediction error using an approximate message passing (AMP) algorithm that can be applied to a broad range of sparse penalties. Following Stein's lemma, the estimator of the generalized degrees of freedom, which is a key quantity for the construction of the estimator of the prediction error, is calculated at the AMP fixed point. The resulting form of the AMP-based estimator does not depend on the penalty function, and its value can be further improved by considering the correlation between predictors. The proposed estimator is asymptotically unbiased when the components of the predictors and response variables are independently generated according to a Gaussian distribution. We examine the behaviour of the estimator for real data under nonconvex sparse penalties, where Akaike's information criterion does not correspond to an unbiased estimator of the prediction error. The model selected by the proposed estimator is close to that which minimizes the true prediction error.

MLNov 8, 2017
Approximate message passing for nonconvex sparse regularization with stability and asymptotic analysis

Ayaka Sakata, Yingying Xu

We analyse a linear regression problem with nonconvex regularization called smoothly clipped absolute deviation (SCAD) under an overcomplete Gaussian basis for Gaussian random data. We propose an approximate message passing (AMP) algorithm considering nonconvex regularization, namely SCAD-AMP, and analytically show that the stability condition corresponds to the de Almeida--Thouless condition in spin glass literature. Through asymptotic analysis, we show the correspondence between the density evolution of SCAD-AMP and the replica symmetric solution. Numerical experiments confirm that for a sufficiently large system size, SCAD-AMP achieves the optimal performance predicted by the replica method. Through replica analysis, a phase transition between replica symmetric (RS) and replica symmetry breaking (RSB) region is found in the parameter space of SCAD. The appearance of the RS region for a nonconvex penalty is a significant advantage that indicates the region of smooth landscape of the optimization problem. Furthermore, we analytically show that the statistical representation performance of the SCAD penalty is better than that of L1-based methods, and the minimum representation error under RS assumption is obtained at the edge of the RS/RSB phase. The correspondence between the convergence of the existing coordinate descent algorithm and RS/RSB transition is also indicated.

NAFeb 6, 2014
Phase transitions and sample complexity in Bayes-optimal matrix factorization

Yoshiyuki Kabashima, Florent Krzakala, Marc Mézard et al.

We analyse the matrix factorization problem. Given a noisy measurement of a product of two matrices, the problem is to estimate back the original matrices. It arises in many applications such as dictionary learning, blind matrix calibration, sparse principal component analysis, blind source separation, low rank matrix completion, robust principal component analysis or factor analysis. It is also important in machine learning: unsupervised representation learning can often be studied through matrix factorization. We use the tools of statistical mechanics - the cavity and replica methods - to analyze the achievability and computational tractability of the inference problems in the setting of Bayes-optimal inference, which amounts to assuming that the two matrices have random independent elements generated from some known distribution, and this information is available to the inference algorithm. In this setting, we compute the minimal mean-squared-error achievable in principle in any computational time, and the error that can be achieved by an efficient approximate message passing algorithm. The computation is based on the asymptotic state-evolution analysis of the algorithm. The performance that our analysis predicts, both in terms of the achieved mean-squared-error, and in terms of sample complexity, is extremely promising and motivating for a further development of the algorithm.

LGJan 26, 2013
Sample Complexity of Bayesian Optimal Dictionary Learning

Ayaka Sakata, Yoshiyuki Kabashima

We consider a learning problem of identifying a dictionary matrix D (M times N dimension) from a sample set of M dimensional vectors Y = N^{-1/2} DX, where X is a sparse matrix (N times P dimension) in which the density of non-zero entries is 0<rho< 1. In particular, we focus on the minimum sample size P_c (sample complexity) necessary for perfectly identifying D of the optimal learning scheme when D and X are independently generated from certain distributions. By using the replica method of statistical mechanics, we show that P_c=O(N) holds as long as alpha = M/N >rho is satisfied in the limit of N to infinity. Our analysis also implies that the posterior distribution given Y is condensed only at the correct dictionary D when the compression rate alpha is greater than a certain critical value alpha_M(rho). This suggests that belief propagation may allow us to learn D with a low computational complexity using O(N) samples.

DIS-NNMar 28, 2012
Statistical Mechanics of Dictionary Learning

Ayaka Sakata, Yoshiyuki Kabashima

Finding a basis matrix (dictionary) by which objective signals are represented sparsely is of major relevance in various scientific and technological fields. We consider a problem to learn a dictionary from a set of training signals. We employ techniques of statistical mechanics of disordered systems to evaluate the size of the training set necessary to typically succeed in the dictionary learning. The results indicate that the necessary size is much smaller than previously estimated, which theoretically supports and/or encourages the use of dictionary learning in practical situations.