Daoyi Dong

h-index41

30papers

591citations

Novelty46%

AI Score43

Ranked #55,174 of 194,257 authors (top 28%)#12,556 in LG (top 31%)

30 Papers

3.3SYMar 21, 2016

Performance Analysis and Coherent Guaranteed Cost Control for Uncertain Quantum Systems Using Small Gain and Popov Methods

Chengdi Xiang, Ian R. Petersen, Daoyi Dong

This paper extends applications of the quantum small gain and Popov methods from existing results on robust stability to performance analysis results for a class of uncertain quantum systems. This class of systems involves a nominal linear quantum system and is subject to quadratic perturbations in the system Hamiltonian. Based on these two methods, coherent guaranteed cost controllers are designed for a given quantum system to achieve improved control performance. An illustrative example also shows that the quantum Popov approach can obtain less conservative results than the quantum small gain approach for the same uncertain quantum system.

1.2SYJun 7, 2018

Fault-Tolerant Control of Linear Quantum Stochastic Systems

Shi Wang, Daoyi Dong

In quantum engineering, faults may occur in a quantum control system, which will cause the quantum control system unstable or deteriorate other relevant performance of the system. This note presents an estimator-based fault-tolerant control design approach for a class of linear quantum stochastic systems subject to fault signals. In this approach, the fault signals and some commutative components of the quantum system observables are estimated, and a fault-tolerant controller is designed to compensate the effect of the fault signals. Numerical procedures are developed for controller design and an example is presented to demonstrate the proposed design approach.

16.5LGMay 22, 2022

A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning

Zhi Wang, Chunlin Chen, Daoyi Dong

While reinforcement learning (RL) algorithms are achieving state-of-the-art performance in various challenging tasks, they can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information. In the paper, we propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge while preventing past memories from being perturbed. We use a Dirichlet process mixture to model the non-stationary task distribution, which captures task relatedness by estimating the likelihood of task-to-cluster assignments and clusters the task models in a latent space. We formulate the prior distribution of the mixture as a Chinese restaurant process (CRP) that instantiates new mixture components as needed. The update and expansion of the mixture are governed by the Bayesian non-parametric framework with an expectation maximization (EM) procedure, which dynamically adapts the model complexity without explicit task boundaries or heuristics. Moreover, we use the domain randomization technique to train robust prior parameters for the initialization of each task model in the mixture, thus the resulting model can better generalize and adapt to unseen tasks. With extensive experiments conducted on robot navigation and locomotion domains, we show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.

1.2QUANT-PHMar 27, 2017

Hybrid Filtering for a Class of Quantum Systems with Classical Disturbances

Qi Yu, Daoyi Dong, Ian R. Petersen et al.

A filtering problem for a class of quantum systems disturbed by a classical stochastic process is investigated in this paper. The classical disturbance process, which is assumed to be described by a linear stochastic differential equation, is modeled by a quantum cavity model. Then the hybrid quantum-classical system is described by a combined quantum system consisting of two quantum cavity subsystems. Quantum filtering theory and a quantum extended Kalman filter method are employed to estimate the states of the combined quantum system. An estimate of the classical stochastic process is derived from the estimate of the combined quantum system. The effectiveness and performance of the proposed methods are illustrated by numerical results.

1.8LGMar 6, 2022

Depthwise Convolution for Multi-Agent Communication with Enhanced Mean-Field Approximation

Donghan Xie, Zhi Wang, Chunlin Chen et al.

Multi-agent settings remain a fundamental challenge in the reinforcement learning (RL) domain due to the partial observability and the lack of accurate real-time interactions across agents. In this paper, we propose a new method based on local communication learning to tackle the multi-agent RL (MARL) challenge within a large number of agents coexisting. First, we design a new communication protocol that exploits the ability of depthwise convolution to efficiently extract local relations and learn local communication between neighboring agents. To facilitate multi-agent coordination, we explicitly learn the effect of joint actions by taking the policies of neighboring agents as inputs. Second, we introduce the mean-field approximation into our method to reduce the scale of agent interactions. To more effectively coordinate behaviors of neighboring agents, we enhance the mean-field approximation by a supervised policy rectification network (PRN) for rectifying real-time agent interactions and by a learnable compensation term for correcting the approximation bias. The proposed method enables efficient coordination as well as outperforms several baseline approaches on the adaptive traffic signal control (ATSC) task and the StarCraft II multi-agent challenge (SMAC).

4.6LGApr 16, 2022

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Jinmei Liu, Zhi Wang, Chunlin Chen et al.

Bayesian policy reuse (BPR) is a general policy transfer framework for selecting a source policy from an offline library by inferring the task belief based on some observation signals and a trained observation model. In this paper, we propose an improved BPR method to achieve more efficient policy transfer in deep reinforcement learning (DRL). First, most BPR algorithms use the episodic return as the observation signal that contains limited information and cannot be obtained until the end of an episode. Instead, we employ the state transition sample, which is informative and instantaneous, as the observation signal for faster and more accurate task inference. Second, BPR algorithms usually require numerous samples to estimate the probability distribution of the tabular-based observation model, which may be expensive and even infeasible to learn and maintain, especially when using the state transition sample as the signal. Hence, we propose a scalable observation model based on fitting state transition functions of source tasks from only a small number of samples, which can generalize to any signals observed in the target task. Moreover, we extend the offline-mode BPR to the continual learning setting by expanding the scalable observation model in a plug-and-play fashion, which can avoid negative transfer when faced with new unknown tasks. Experimental results show that our method can consistently facilitate faster and more efficient policy transfer.

1.2SYJun 8, 2018

Several recent developments in estimation and robust control of quantum systems

Daoyi Dong, Yuanlong Wang

This paper summarizes several recent developments in the area of estimation and robust control of quantum systems and outlines several directions for future research. Quantum state tomography via linear regression estimation and adaptive quantum state estimation are introduced and a Hamiltonian identification algorithm is outlined. Two quantum robust control approaches including sliding mode control and sampling-based learning control are illustrated.

2.3SYJan 20, 2025

Fast State Stabilization using Deep Reinforcement Learning for Measurement-based Quantum Feedback Control

Chunxiang Song, Yanan Liu, Daoyi Dong et al.

The stabilization of quantum states is a fundamental problem for realizing various quantum technologies. Measurement-based-feedback strategies have demonstrated powerful performance, and the construction of quantum control signals using measurement information has attracted great interest. However, the interaction between quantum systems and the environment is inevitable, especially when measurements are introduced, which leads to decoherence. To mitigate decoherence, it is desirable to stabilize quantum systems faster, thereby reducing the time of interaction with the environment. In this paper, we utilize information obtained from measurement and apply deep reinforcement learning (DRL) algorithms, without explicitly constructing specific complex measurement-control mappings, to rapidly drive random initial quantum state to the target state. The proposed DRL algorithm has the ability to speed up the convergence to a target state, which shortens the interaction between quantum systems and their environments to protect coherence. Simulations are performed on two-qubit and three-qubit systems, and the results show that our algorithm can successfully stabilize random initial quantum system to the target entangled state, with a convergence time faster than traditional methods such as Lyapunov feedback control and several DRL algorithms with different reward functions. Moreover, it exhibits robustness against imperfect measurements and delays in system evolution.

4.3QUANT-PHFeb 28, 2023

Auxiliary Task-based Deep Reinforcement Learning for Quantum Control

Shumin Zhou, Hailan Ma, Sen Kuang et al.

Due to its property of not requiring prior knowledge of the environment, reinforcement learning has significant potential for quantum control problems. In this work, we investigate the effectiveness of continuous control policies based on deep deterministic policy gradient. To solve the sparse reward signal in quantum learning control problems, we propose an auxiliary task-based deep reinforcement learning (AT-DRL) for quantum control. In particular, we first design a guided reward function based on the fidelity of quantum states that enables incremental fidelity improvement. Then, we introduce the concept of an auxiliary task whose network shares parameters with the main network to predict the reward provided by the environment (called the main task). The auxiliary task learns synchronously with the main task, allowing one to select the most relevant features of the environment, thus aiding the agent in comprehending how to achieve the desired state. The numerical simulations demonstrate that the proposed AT-DRL can provide a solution to the sparse reward in quantum systems, and has great potential in designing control pulses that achieve efficient quantum state preparation.

2.3QUANT-PHSep 30, 2023

Learning Informative Latent Representation for Quantum State Tomography

Hailan Ma, Zhenhong Sun, Daoyi Dong et al.

Quantum state tomography (QST) is the process of reconstructing the complete state of a quantum system (mathematically described as a density matrix) through a series of different measurements. These measurements are performed on a number of identical copies of the quantum system, with outcomes gathered as frequencies. QST aims to recover the density matrix or the properties of the quantum state from the measured frequencies. Although an informationally complete set of measurements can specify the quantum state accurately in an ideal scenario with a large number of identical copies, both the measurements and identical copies are restricted and imperfect in practical scenarios, making QST highly ill-posed. The conventional QST methods usually assume accurate measured frequencies or rely on manually designed regularizers to handle the ill-posed reconstruction problem, suffering from limited applications in realistic scenarios. Recent advances in deep neural networks (DNN) led to the emergence of deep learning in QST. However, existing DL-based QST approaches often employ generic DNN models that are not optimized for imperfect conditions of QST. In this paper, we propose a transformer-based autoencoder architecture tailored for QST with imperfect measurement data. Our method leverages a transformer-based encoder to extract an informative latent representation (ILR) from imperfect measurement data and employs a decoder to predict the quantum states based on the ILR. We anticipate that the high-dimensional ILR will capture more comprehensive information about the quantum states. To achieve this, we conduct pre-training of the encoder using a pretext task that involves reconstructing high-quality frequencies from measured frequencies. Extensive simulations and experiments demonstrate the remarkable ability of the informative latent representation to deal with imperfect measurement data in QST.

1.2QUANT-PHFeb 23, 2025

Learning-Based Design of LQG Controllers in Quantum Coherent Feedback

Chunxiang Song, Yanan Liu, Guofeng Zhang et al.

In this paper, we propose a differential evolution (DE) algorithm specifically tailored for the design of Linear-Quadratic-Gaussian (LQG) controllers in quantum systems. Building upon the foundational DE framework, the algorithm incorporates specialized modules, including relaxed feasibility rules, a scheduled penalty function, adaptive search range adjustment, and the ``bet-and-run'' initialization strategy. These enhancements improve the algorithm's exploration and exploitation capabilities while addressing the unique physical realizability requirements of quantum systems. The proposed method is applied to a quantum optical system, where three distinct controllers with varying configurations relative to the plant are designed. The resulting controllers demonstrate superior performance, achieving lower LQG performance indices compared to existing approaches. Additionally, the algorithm ensures that the designs comply with physical realizability constraints, guaranteeing compatibility with practical quantum platforms. The proposed approach holds significant potential for application to other linear quantum systems in performance optimization tasks subject to physically feasible constraints.

11.1AIApr 21, 2025Code

Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision

Shilin Zhang, Zican Hu, Wenhao Wu et al.

Offline meta-RL usually tackles generalization by inferring task beliefs from high-quality samples or warmup explorations. The restricted form limits their generality and usability since these supervision signals are expensive and even infeasible to acquire in advance for unseen tasks. Learning directly from the raw text about decision tasks is a promising alternative to leverage a much broader source of supervision. In the paper, we propose \textbf{T}ext-to-\textbf{D}ecision \textbf{A}gent (\textbf{T2DA}), a simple and scalable framework that supervises offline meta-RL with natural language. We first introduce a generalized world model to encode multi-task decision data into a dynamics-aware embedding space. Then, inspired by CLIP, we predict which textual description goes with which decision embedding, effectively bridging their semantic gap via contrastive language-decision pre-training and aligning the text embeddings to comprehend the environment dynamics. After training the text-conditioned generalist policy, the agent can directly realize zero-shot text-to-decision generation in response to language instructions. Comprehensive experiments on MuJoCo and Meta-World benchmarks show that T2DA facilitates high-capacity zero-shot generalization and outperforms various types of baselines. Our code is available at https://github.com/NJU-RL/T2DA.

5.2CVDec 18, 2024Code

T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Generation

Zhenhong Sun, Yifu Wang, Yonhon Ng et al.

Scene generation is crucial to many computer graphics applications. Recent advances in generative AI have streamlined sketch-to-image workflows, easing the workload for artists and designers in creating scene concept art. However, these methods often struggle for complex scenes with multiple detailed objects, sometimes missing small or uncommon instances. In this paper, we propose a Training-free Triplet Tuning for Sketch-to-Scene (T3-S2S) generation after reviewing the entire cross-attention mechanism. This scheme revitalizes the existing ControlNet model, enabling effective handling of multi-instance generations, involving prompt balance, characteristics prominence, and dense tuning. Specifically, this approach enhances keyword representation via the prompt balance module, reducing the risk of missing critical instances. It also includes a characteristics prominence module that highlights TopK indices in each channel, ensuring essential features are better represented based on token sketches. Additionally, it employs dense tuning to refine contour details in the attention map, compensating for instance-related regions. Experiments validate that our triplet tuning approach substantially improves the performance of existing sketch-to-image models. It consistently generates detailed, multi-instance 2D images, closely adhering to the input prompts and enhancing visual quality in complex multi-instance scenes. Code is available at https://github.com/chaos-sun/t3s2s.git.

4.1LGOct 31, 2025

QiNN-QJ: A Quantum-inspired Neural Network with Quantum Jump for Multimodal Sentiment Analysis

Yiwei Chen, Kehuan Yan, Yu Pan et al.

Quantum theory provides non-classical principles, such as superposition and entanglement, that inspires promising paradigms in machine learning. However, most existing quantum-inspired fusion models rely solely on unitary or unitary-like transformations to generate quantum entanglement. While theoretically expressive, such approaches often suffer from training instability and limited generalizability. In this work, we propose a Quantum-inspired Neural Network with Quantum Jump (QiNN-QJ) for multimodal entanglement modelling. Each modality is firstly encoded as a quantum pure state, after which a differentiable module simulating the QJ operator transforms the separable product state into the entangled representation. By jointly learning Hamiltonian and Lindblad operators, QiNN-QJ generates controllable cross-modal entanglement among modalities with dissipative dynamics, where structured stochasticity and steady-state attractor properties serve to stabilize training and constrain entanglement shaping. The resulting entangled states are projected onto trainable measurement vectors to produce predictions. In addition to achieving superior performance over the state-of-the-art models on benchmark datasets, including CMU-MOSI, CMU-MOSEI, and CH-SIMS, QiNN-QJ facilitates enhanced post-hoc interpretability through von-Neumann entanglement entropy. This work establishes a principled framework for entangled multimodal fusion and paves the way for quantum-inspired approaches in modelling complex cross-modal correlations.

16.9LGJun 5, 2025Code

Mixture-of-Experts Meets In-Context Reinforcement Learning

Wenhao Wu, Fuhong Liu, Haoru Li et al.

In-context reinforcement learning (ICRL) has emerged as a promising paradigm for adapting RL agents to downstream tasks through prompt conditioning. However, two notable challenges remain in fully harnessing in-context learning within RL domains: the intrinsic multi-modality of the state-action-reward data and the diverse, heterogeneous nature of decision tasks. To tackle these challenges, we propose T2MIR (Token- and Task-wise MoE for In-context RL), an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models. T2MIR substitutes the feedforward layer with two parallel layers: a token-wise MoE that captures distinct semantics of input tokens across multiple modalities, and a task-wise MoE that routes diverse tasks to specialized experts for managing a broad task distribution with alleviated gradient conflicts. To enhance task-wise routing, we introduce a contrastive learning method that maximizes the mutual information between the task and its router representation, enabling more precise capture of task-relevant information. The outputs of two MoE components are concatenated and fed into the next layer. Comprehensive experiments show that T2MIR significantly facilitates in-context learning capacity and outperforms various types of baselines. We bring the potential and promise of MoE to ICRL, offering a simple and scalable architectural enhancement to advance ICRL one step closer toward achievements in language and vision communities. Our code is available at https://github.com/NJU-RL/T2MIR.

3.3AINov 17, 2025

Conditional Diffusion Model for Multi-Agent Dynamic Task Decomposition

Yanda Zhu, Yuanyang Zhu, Daoyi Dong et al.

Task decomposition has shown promise in complex cooperative multi-agent reinforcement learning (MARL) tasks, which enables efficient hierarchical learning for long-horizon tasks in dynamic and uncertain environments. However, learning dynamic task decomposition from scratch generally requires a large number of training samples, especially exploring the large joint action space under partial observability. In this paper, we present the Conditional Diffusion Model for Dynamic Task Decomposition (C$\text{D}^\text{3}$T), a novel two-level hierarchical MARL framework designed to automatically infer subtask and coordination patterns. The high-level policy learns subtask representation to generate a subtask selection strategy based on subtask effects. To capture the effects of subtasks on the environment, C$\text{D}^\text{3}$T predicts the next observation and reward using a conditional diffusion model. At the low level, agents collaboratively learn and share specialized skills within their assigned subtasks. Moreover, the learned subtask representation is also used as additional semantic information in a multi-head attention mixing network to enhance value decomposition and provide an efficient reasoning bridge between individual and joint value functions. Experimental results on various benchmarks demonstrate that C$\text{D}^\text{3}$T achieves better performance than existing baselines.

1.2SYApr 20, 2024

Online Planning of Power Flows for Power Systems Against Bushfires Using Spatial Context

Jianyu Xu, Qiuzhuang Sun, Yang Yang et al.

The 2019-20 Australia bushfire incurred numerous economic losses and significantly affected the operations of power systems. A power station or transmission line can be significantly affected due to bushfires, leading to an increase in operational costs. We study a fundamental but challenging problem of planning the optimal power flow (OPF) for power systems subject to bushfires. Considering the stochastic nature of bushfire spread, we develop a model to capture such dynamics based on Moore's neighborhood model. Under a periodic inspection scheme that reveals the in-situ bushfire status, we propose an online optimization modeling framework that sequentially plans the power flows in the electricity network. Our framework assumes that the spread of bushfires is non-stationary over time, and the spread and containment probabilities are unknown. To meet these challenges, we develop a contextual online learning algorithm that treats the in-situ geographical information of the bushfire as a 'spatial context'. The online learning algorithm learns the unknown probabilities sequentially based on the observed data and then makes the OPF decision accordingly. The sequential OPF decisions aim to minimize the regret function, which is defined as the cumulative loss against the clairvoyant strategy that knows the true model parameters. We provide a theoretical guarantee of our algorithm by deriving a bound on the regret function, which outperforms the regret bound achieved by other benchmark algorithms. Our model assumptions are verified by the real bushfire data from NSW, Australia, and we apply our model to two power systems to illustrate its applicability.

5.1QUANT-PHMay 9, 2023

Tomography of Quantum States from Structured Measurements via quantum-aware transformer

Hailan Ma, Zhenhong Sun, Daoyi Dong et al.

Quantum state tomography (QST) is the process of reconstructing the state of a quantum system (mathematically described as a density matrix) through a series of different measurements, which can be solved by learning a parameterized function to translate experimentally measured statistics into physical density matrices. However, the specific structure of quantum measurements for characterizing a quantum state has been neglected in previous work. In this paper, we explore the similarity between highly structured sentences in natural language and intrinsically structured measurements in QST. To fully leverage the intrinsic quantum characteristics involved in QST, we design a quantum-aware transformer (QAT) model to capture the complex relationship between measured frequencies and density matrices. In particular, we query quantum operators in the architecture to facilitate informative representations of quantum data and integrate the Bures distance into the loss function to evaluate quantum state fidelity, thereby enabling the reconstruction of quantum states from measured data with high fidelity. Extensive simulations and experiments (on IBM quantum computers) demonstrate the superiority of the QAT in reconstructing quantum states with favorable robustness against experimental noise.

4.4LGAug 19, 2021Code

Residual Tensor Train: A Quantum-inspired Approach for Learning Multiple Multilinear Correlations

Yiwei Chen, Yu Pan, Daoyi Dong

States of quantum many-body systems are defined in a high-dimensional Hilbert space, where rich and complex interactions among subsystems can be modelled. In machine learning, complex multiple multilinear correlations may also exist within input features. In this paper, we present a quantum-inspired multilinear model, named Residual Tensor Train (ResTT), to capture the multiple multilinear correlations of features, from low to high orders, within a single model. ResTT is able to build a robust decision boundary in a high-dimensional space for solving fitting and classification tasks. In particular, we prove that the fully-connected layer and the Volterra series can be taken as special cases of ResTT. Furthermore, we derive the rule for weight initialization that stabilizes the training of ResTT based on a mean-field analysis. We prove that such a rule is much more relaxed than that of TT, which means ResTT can easily address the vanishing and exploding gradient problem that exists in the existing TT models. Numerical experiments demonstrate that ResTT outperforms the state-of-the-art tensor network and benchmark deep learning models on MNIST and Fashion-MNIST datasets. Moreover, ResTT achieves better performance than other statistical methods on two practical examples with limited data which are known to have complex feature interactions.

16.4ROApr 15, 2021

Rule-Based Reinforcement Learning for Efficient Robot Navigation with Space Reduction

Yuanyang Zhu, Zhi Wang, Chunlin Chen et al.

For real-world deployments, it is critical to allow robots to navigate in complex environments autonomously. Traditional methods usually maintain an internal map of the environment, and then design several simple rules, in conjunction with a localization and planning approach, to navigate through the internal map. These approaches often involve a variety of assumptions and prior knowledge. In contrast, recent reinforcement learning (RL) methods can provide a model-free, self-learning mechanism as the robot interacts with an initially unknown environment, but are expensive to deploy in real-world scenarios due to inefficient exploration. In this paper, we focus on efficient navigation with the RL technique and combine the advantages of these two kinds of methods into a rule-based RL (RuRL) algorithm for reducing the sample complexity and cost of time. First, we use the rule of wall-following to generate a closed-loop trajectory. Second, we employ a reduction rule to shrink the trajectory, which in turn effectively reduces the redundant exploration space. Besides, we give the detailed theoretical guarantee that the optimal navigation path is still in the reduced space. Third, in the reduced space, we utilize the Pledge rule to guide the exploration strategy for accelerating the RL process at the early stage. Experiments conducted on real robot navigation problems in hex-grid environments demonstrate that RuRL can achieve improved navigation performance.

6.6CRFeb 20, 2021

Bayesian adversarial multi-node bandit for optimal smart grid protection against cyber attacks

Jianyu Xu, Bin Liu, Huadong Mo et al.

The cybersecurity of smart grids has become one of key problems in developing reliable modern power and energy systems. This paper introduces a non-stationary adversarial cost with a variation constraint for smart grids and enables us to investigate the problem of optimal smart grid protection against cyber attacks in a relatively practical scenario. In particular, a Bayesian multi-node bandit (MNB) model with adversarial costs is constructed and a new regret function is defined for this model. An algorithm called Thompson-Hedge algorithm is presented to solve the problem and the superior performance of the proposed algorithm is proven in terms of the convergence rate of the regret function. The applicability of the algorithm to real smart grid scenarios is verified and the performance of the algorithm is also demonstrated by numerical examples.

14.6LGJan 6, 2021

Deep Reinforcement Learning with Quantum-inspired Experience Replay

Qing Wei, Hailan Ma, Chunlin Chen et al.

In this paper, a novel training paradigm inspired by quantum computation is proposed for deep reinforcement learning (DRL) with experience replay. In contrast to traditional experience replay mechanism in DRL, the proposed deep reinforcement learning with quantum-inspired experience replay (DRL-QER) adaptively chooses experiences from the replay buffer according to the complexity and the replayed times of each experience (also called transition), to achieve a balance between exploration and exploitation. In DRL-QER, transitions are first formulated in quantum representations, and then the preparation operation and the depreciation operation are performed on the transitions. In this progress, the preparation operation reflects the relationship between the temporal difference errors (TD-errors) and the importance of the experiences, while the depreciation operation is taken into account to ensure the diversity of the transitions. The experimental results on Atari 2600 games show that DRL-QER outperforms state-of-the-art algorithms such as DRL-PER and DCRL on most of these games with improved training efficiency, and is also applicable to such memory-based DRL approaches as double network and dueling network.

4.3QUANT-PHDec 31, 2020

Curriculum-based Deep Reinforcement Learning for Quantum Control

Hailan Ma, Daoyi Dong, Steven X. Ding et al.

Deep reinforcement learning has been recognized as an efficient technique to design optimal strategies for different complex systems without prior knowledge of the control landscape. To achieve a fast and precise control for quantum systems, we propose a novel deep reinforcement learning approach by constructing a curriculum consisting of a set of intermediate tasks defined by a fidelity threshold. Tasks among a curriculum can be statically determined using empirical knowledge or adaptively generated with the learning process. By transferring knowledge between two successive tasks and sequencing tasks according to their difficulties, the proposed curriculum-based deep reinforcement learning (CDRL) method enables the agent to focus on easy tasks in the early stage, then move onto difficult tasks, and eventually approaches the final task. Numerical simulations on closed quantum systems and open quantum systems demonstrate that the proposed method exhibits improved control performance for quantum systems and also provides an efficient way to identify optimal strategies with fewer control pulses.

5.8LGOct 9, 2020Code

Instance Weighted Incremental Evolution Strategies for Reinforcement Learning in Dynamic Environments

Zhi Wang, Chunlin Chen, Daoyi Dong

Evolution strategies (ES), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient, and are much faster when many central processing units (CPUs) are available due to better parallelization. In this paper, we propose a systematic incremental learning method for ES in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. We incorporate an instance weighting mechanism with ES to facilitate its learning adaptation, while retaining scalability of ES. During parameter updating, higher weights are assigned to instances that contain more new knowledge, thus encouraging the search distribution to move towards new promising areas of parameter space. We propose two easy-to-implement metrics to calculate the weights: instance novelty and instance quality. Instance novelty measures an instance's difference from the previous optimum in the original environment, while instance quality corresponds to how well an instance performs in the new environment. The resulting algorithm, Instance Weighted Incremental Evolution Strategies (IW-IES), is verified to achieve significantly improved performance on challenging RL tasks ranging from robot navigation to locomotion. This paper thus introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.

2.8CLAug 23, 2020

Quantum Language Model with Entanglement Embedding for Question Answering

Yiwei Chen, Yu Pan, Daoyi Dong

Quantum Language Models (QLMs) in which words are modelled as quantum superposition of sememes have demonstrated a high level of model transparency and good post-hoc interpretability. Nevertheless, in the current literature word sequences are basically modelled as a classical mixture of word states, which cannot fully exploit the potential of a quantum probabilistic description. A full quantum model is yet to be developed to explicitly capture the non-classical correlations within the word sequences. We propose a neural network model with a novel Entanglement Embedding (EE) module, whose function is to transform the word sequences into entangled pure states of many-body quantum systems. Strong quantum entanglement, which is the central concept of quantum information and an indication of parallelized correlations among the words, is observed within the word sequences. Numerical experiments show that the proposed QLM with EE (QLM-EE) achieves superior performance compared with the classical deep neural network models and other QLMs on Question Answering (QA) datasets. In addition, the post-hoc interpretability of the model can be improved by quantizing the degree of entanglement among the words.

13.2LGJul 28, 2020Code

Lifelong Incremental Reinforcement Learning with Online Bayesian Inference

Zhi Wang, Chunlin Chen, Daoyi Dong

A central capability of a long-lived reinforcement learning (RL) agent is to incrementally adapt its behavior as its environment changes, and to incrementally build upon previous experiences to facilitate future learning in real-world scenarios. In this paper, we propose LifeLong Incremental Reinforcement Learning (LLIRL), a new incremental algorithm for efficient lifelong adaptation to dynamic environments. We develop and maintain a library that contains an infinite mixture of parameterized environment models, which is equivalent to clustering environment parameters in a latent space. The prior distribution over the mixture is formulated as a Chinese restaurant process (CRP), which incrementally instantiates new environment models without any external information to signal environmental changes in advance. During lifelong learning, we employ the expectation maximization (EM) algorithm with online Bayesian inference to update the mixture in a fully incremental manner. In EM, the E-step involves estimating the posterior expectation of environment-to-cluster assignments, while the M-step updates the environment parameters for future learning. This method allows for all environment models to be adapted as necessary, with new models instantiated for environmental changes and old models retrieved when previously seen environments are encountered again. Experiments demonstrate that LLIRL outperforms relevant existing methods, and enables effective incremental adaptation to various dynamic environments for lifelong learning.

10.8QUANT-PHMay 22, 2020

On compression rate of quantum autoencoders: Control design, numerical and experimental realization

Hailan Ma, Chang-Jiang Huang, Chunlin Chen et al.

Quantum autoencoders which aim at compressing quantum information in a low-dimensional latent space lie in the heart of automatic data compression in the field of quantum information. In this paper, we establish an upper bound of the compression rate for a given quantum autoencoder and present a learning control approach for training the autoencoder to achieve the maximal compression rate. The upper bound of the compression rate is theoretically proven using eigen-decomposition and matrix differentiation, which is determined by the eigenvalues of the density matrix representation of the input states. Numerical results on 2-qubit and 3-qubit systems are presented to demonstrate how to train the quantum autoencoder to achieve the theoretically maximal compression, and the training performance using different machine learning algorithms is compared. Experimental results of a quantum autoencoder using quantum optical systems are illustrated for compressing two 2-qubit states into two 1-qubit states.

1.2QUANT-PHSep 3, 2017

Achieving robust and high-fidelity quantum control via spectral phase optimization

Yu Guo, Daoyi Dong, Chuan-Cun Shu

Achieving high-fidelity control of quantum systems is of fundamental importance in physics, chemistry and quantum information sciences. However, the successful implementation of a high-fidelity quantum control scheme also requires robustness against control field fluctuations. Here, we demonstrate a robust optimization method for control of quantum systems by optimizing the spectral phase of an ultrafast laser pulse, which is accomplished in the framework of frequency domain quantum optimal control theory. By incorporating a filtering function of frequency into the optimization algorithm, our numerical simulations in an abstract two-level quantum system as well as in a three-level atomic rubidium show that the optimization procedure can be enforced to search optimal solutions while achieving remarkable robustness against the control field fluctuations, providing an efficient approach to optimize the spectral phase of the ultrafast laser pulse to achieve a desired final quantum state of the system.

1.2SYSep 9, 2015

Coherent Robust H-Infinity Control of Uncertain Linear Quantum Stochastic Systems

Chengdi Xiang, Ian R. Petersen, Daoyi Dong

This paper considers a class of uncertain linear quantum systems subject to uncertain perturbations in the system Hamiltonian. We present a method to design a coherent robust H-infinity controller so that the closed loop system is robustly stable and achieves a prescribed level of disturbance attenuation with all the admissible uncertainties. An illustrative example shows that for the given system, the method presented in this paper has improved performance over the existing quantum H-infinity control results without considering uncertainty.

1.2SYAug 11, 2015

Guaranteed Cost Dynamic Coherent Control for Uncertain Linear Quantum Systems

Chengdi Xiang, Ian R. Petersen, Daoyi Dong

This paper concerns a class of uncertain linear quantum systems subject to quadratic perturbations in the system Hamiltonian. A small gain approach is used to evaluate the performance of the given quantum system. In order to get improved control performance, we propose two methods to design a coherent controller for the system. One is to formulate a static quantum controller by adding a controller Hamiltonian to the given system, and the other is to build a dynamic quantum controller which is directly coupled to the given system. Both controller design methods are given in terms of LMIs and a non-convex equality. Hence, a rank constrained LMI method is used as a numerical procedure. An illustrative example is given to demonstrate the proposed methods and also to make a performance comparison with different controller design methods. Results show that for the same uncertain quantum system, the dynamic quantum controller can offer an improvement in performance over the static quantum controller.