LGJun 20, 2024
A General Control-Theoretic Approach for Reinforcement Learning: Theory and AlgorithmsWeiqin Chen, Mark S. Squillante, Chai Wah Wu et al.
We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman operator and Q-learning, a new control-policy-variable gradient theorem, and a specific gradient ascent algorithm based on this theorem within the context of a specific control-theoretic framework. We empirically evaluate the performance of our control theoretic approach on several classical reinforcement learning tasks, demonstrating significant improvements in solution quality, sample complexity, and running time of our approach over state-of-the-art methods.
QUANT-PHDec 29, 2021
Active Learning of Quantum System Hamiltonians yields Query AdvantageArkopal Dutt, Edwin Pednault, Chai Wah Wu et al.
Hamiltonian learning is an important procedure in quantum system identification, calibration, and successful operation of quantum computers. Through queries to the quantum system, this procedure seeks to obtain the parameters of a given Hamiltonian model and description of noise sources. Standard techniques for Hamiltonian learning require careful design of queries and $O(ε^{-2})$ queries in achieving learning error $ε$ due to the standard quantum limit. With the goal of efficiently and accurately estimating the Hamiltonian parameters within learning error $ε$ through minimal queries, we introduce an active learner that is given an initial set of training examples and the ability to interactively query the quantum system to generate new training data. We formally specify and experimentally assess the performance of this Hamiltonian active learning (HAL) algorithm for learning the six parameters of a two-qubit cross-resonance Hamiltonian on four different superconducting IBM Quantum devices. Compared with standard techniques for the same problem and a specified learning error, HAL achieves up to a $99.8\%$ reduction in queries required, and a $99.1\%$ reduction over the comparable non-adaptive learning algorithm. Moreover, with access to prior information on a subset of Hamiltonian parameters and given the ability to select queries with linearly (or exponentially) longer system interaction times during learning, HAL can exceed the standard quantum limit and achieve Heisenberg (or super-Heisenberg) limited convergence rates during learning.
ARFeb 22, 2021
Dither computing: a hybrid deterministic-stochastic computing frameworkChai Wah Wu
Stochastic computing has a long history as an alternative method of performing arithmetic on a computer. While it can be considered an unbiased estimator of real numbers, it has a variance and MSE on the order of $Ω(\frac{1}{N})$. On the other hand, deterministic variants of stochastic computing remove the stochastic aspect, but cannot approximate arbitrary real numbers with arbitrary precision and are biased estimators. However, they have an asymptotically superior MSE on the order of $O(\frac{1}{N^2})$. Recent results in deep learning with stochastic rounding suggest that the bias in the rounding can degrade performance. We proposed an alternative framework, called dither computing, that combines aspects of stochastic computing and its deterministic variants and that can perform computing with similar efficiency, is unbiased, and with a variance and MSE also on the optimal order of $Θ(\frac{1}{N^2})$. We also show that it can be beneficial in stochastic rounding applications as well. We provide implementation details and give experimental results to comparatively show the benefits of the proposed scheme.
LGMay 28, 2019
A General Markov Decision Process Framework for Directly Learning Optimal Control PoliciesYingdong Lu, Mark S. Squillante, Chai Wah Wu
We consider a new form of reinforcement learning (RL) that is based on opportunities to directly learn the optimal control policy and a general Markov decision process (MDP) framework devised to support these opportunities. Derivations of general classes of our control-based RL methods are presented, together with forms of exploration and exploitation in learning and applying the optimal control policy over time. Our general MDP framework extends the classical Bellman operator and optimality criteria by generalizing the definition and scope of a policy for any given state. We establish the convergence and optimality-both in general and within various control paradigms (e.g., piecewise linear control policies)-of our control-based methods through this general MDP framework, including convergence of $Q$-learning within the context of our MDP framework. Our empirical results demonstrate and quantify the significant benefits of our approach.
LGMay 25, 2019
TableNet: a multiplier-less implementation of neural networks for inferencingChai Wah Wu
We consider the use of look-up tables (LUT) to simplify the hardware implementation of a deep learning network for inferencing after weights have been successfully trained. The use of LUT replaces the matrix multiply and add operations with a small number of LUTs and addition operations resulting in a completely multiplier-less implementation. We compare the different tradeoffs of this approach in terms of accuracy versus LUT size and the number of operations and show that similar performance can be obtained with a comparable memory footprint as a full precision deep neural network, but without the use of any multipliers. We illustrate this with several architectures such as MLP and CNN.
LGSep 6, 2018
ProdSumNet: reducing model parameters in deep neural networks via product-of-sums matrix decompositionsChai Wah Wu
We consider a general framework for reducing the number of trainable model parameters in deep learning networks by decomposing linear operators as a product of sums of simpler linear operators. Recently proposed deep learning architectures such as CNN, KFC, Dilated CNN, etc. are all subsumed in this framework and we illustrate other types of neural network architectures within this framework. We show that good accuracy on MNIST and Fashion MNIST can be obtained using a relatively small number of trainable parameters. In addition, since implementation of the convolutional layer is resource-heavy, we consider an approach in the transform domain that obviates the need for convolutional layers. One of the advantages of this general framework over prior approaches is that the number of trainable parameters is not fixed and can be varied arbitrarily. In particular, we illustrate the tradeoff of varying the number of trainable variables and the corresponding error rate. As an example, by using this decomposition on a reference CNN architecture for MNIST with over 3x10^6 trainable parameters, we are able to obtain an accuracy of 98.44% using only 3554 trainable parameters.
MLMay 21, 2018
A General Family of Robust Stochastic Operators for Reinforcement LearningYingdong Lu, Mark S. Squillante, Chai Wah Wu
We consider a new family of operators for reinforcement learning with the goal of alleviating the negative effects and becoming more robust to approximation or estimation errors. Various theoretical results are established, which include showing on a sample path basis that our family of operators preserve optimality and increase the action gap. Our empirical results illustrate the strong benefits of our family of operators, significantly outperforming the classical Bellman operator and recently proposed operators.
LGMay 18, 2018
Can machine learning identify interesting mathematics? An exploration using empirically observed lawsChai Wah Wu
We explore the possibility of using machine learning to identify interesting mathematical structures by using certain quantities that serve as fingerprints. In particular, we extract features from integer sequences using two empirical laws: Benford's law and Taylor's law and experiment with various classifiers to identify whether a sequence is, for example, nice, important, multiplicative, easy to compute or related to primes or palindromes.
SPMay 18, 2018
Designing communication systems via iterative improvement: error correction coding with Bayes decoder and codebook optimized for source symbol errorChai Wah Wu
In most error correction coding (ECC) frameworks, the typical error metric is the bit error rate (BER) which measures the number of bit errors. For this metric, the positions of the bits are not relevant to the decoding, and in many noise models, not relevant to the BER either. In many applications this is unsatisfactory as typically all bits are not equal and have different significance. We consider the problem of bit error correction and mitigation where bits in different positions have different importance. For error correction, we look at ECC from a Bayesian perspective and introduce Bayes estimators with general loss functions to take into account the bit significance. We propose ECC schemes that optimize this error metric. As the problem is highly nonlinear, traditional ECC construction techniques are not applicable. Using exhaustive search is cost prohibitive, and thus we use iterative improvement search techniques to find good codebooks. We optimize both general codebooks and linear codes. We provide numerical experiments to show that they can be superior to classical linear block codes such as Hamming codes and decoding methods such as minimum distance decoding. For error mitigation, we study the case where ECC is not possible or not desirable, but significance aware encoding of information is still beneficial in reducing the average error. We propose a novel number presentation format suitable for emerging storage media where the noise magnitude is unknown and possibly large and show that it has lower mean error than the traditional number format.