On Learning Sparsely Used Dictionaries from Incomplete Samples
This addresses a practical issue in applications like hyper-spectral imaging or blood glucose monitoring where data is often incomplete, offering a novel solution for dictionary learning in such settings.
The paper tackles the problem of learning sparsely used dictionaries from incomplete data samples, where only a fraction of entries are observed, and provides provably correct polynomial-time algorithms that converge to the true dictionary with theoretical guarantees.
Most existing algorithms for dictionary learning assume that all entries of the (high-dimensional) input data are fully observed. However, in several practical applications (such as hyper-spectral imaging or blood glucose monitoring), only an incomplete fraction of the data entries may be available. For incomplete settings, no provably correct and polynomial-time algorithm has been reported in the dictionary learning literature. In this paper, we provide provable approaches for learning - from incomplete samples - a family of dictionaries whose atoms have sufficiently "spread-out" mass. First, we propose a descent-style iterative algorithm that linearly converges to the true dictionary when provided a sufficiently coarse initial estimate. Second, we propose an initialization algorithm that utilizes a small number of extra fully observed samples to produce such a coarse initial estimate. Finally, we theoretically analyze their performance and provide asymptotic statistical and computational guarantees.