Sample Complexity of Bayesian Optimal Dictionary Learning
This work addresses the sample complexity for dictionary learning, which is incremental as it builds on prior statistical mechanics methods to derive theoretical bounds.
The paper tackles the problem of determining the minimum sample size needed for perfect dictionary identification in a Bayesian optimal learning scheme, showing that P_c=O(N) samples are sufficient when the compression rate alpha exceeds the sparsity density rho.
We consider a learning problem of identifying a dictionary matrix D (M times N dimension) from a sample set of M dimensional vectors Y = N^{-1/2} DX, where X is a sparse matrix (N times P dimension) in which the density of non-zero entries is 0<rho< 1. In particular, we focus on the minimum sample size P_c (sample complexity) necessary for perfectly identifying D of the optimal learning scheme when D and X are independently generated from certain distributions. By using the replica method of statistical mechanics, we show that P_c=O(N) holds as long as alpha = M/N >rho is satisfied in the limit of N to infinity. Our analysis also implies that the posterior distribution given Y is condensed only at the correct dictionary D when the compression rate alpha is greater than a certain critical value alpha_M(rho). This suggests that belief propagation may allow us to learn D with a low computational complexity using O(N) samples.