Towards Sampling from Nondirected Probabilistic Graphical models using a D-Wave Quantum Annealer
This work addresses the challenge of efficient sampling in probabilistic graphical models for machine learning practitioners, offering a quantum-based method that outperforms classical approaches on a specific dataset, though it is incremental as it builds on existing quantum annealing and RBM techniques.
The study tackled the problem of sampling from Restricted Boltzmann Machines (RBMs) for tasks like image classification and generation, using a D-Wave quantum annealer as an alternative to classical Markov Chain Monte Carlo (MCMC). The result showed that the D-Wave achieved more than two times lower classification error than MCMC and explored more local valleys in the energy landscape.
A D-Wave quantum annealer (QA) having a 2048 qubit lattice, with no missing qubits and couplings, allowed embedding of a complete graph of a Restricted Boltzmann Machine (RBM). A handwritten digit OptDigits data set having 8x7 pixels of visible units was used to train the RBM using a classical Contrastive Divergence. Embedding of the classically-trained RBM into the D-Wave lattice was used to demonstrate that the QA offers a high-efficiency alternative to the classical Markov Chain Monte Carlo (MCMC) for reconstructing missing labels of the test images as well as a generative model. At any training iteration, the D-Wave-based classification had classification error more than two times lower than MCMC. The main goal of this study was to investigate the quality of the sample from the RBM model distribution and its comparison to a classical MCMC sample. For the OptDigits dataset, the states in the D-Wave sample belonged to about two times more local valleys compared to the MCMC sample. All the lowest-energy (the highest joint probability) local minima in the MCMC sample were also found by the D-Wave. The D-Wave missed many of the higher-energy local valleys, while finding many "new" local valleys consistently missed by the MCMC. It was established that the "new" local valleys that the D-Wave finds are important for the model distribution in terms of the energy of the corresponding local minima, the width of the local valleys, and the height of the escape barrier.