CLApr 29, 2022
QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware RelevanceXiaoqiang Wang, Bang Liu, Siliang Tang et al. · mila
Existing metrics for assessing question generation not only require costly human reference but also fail to take into account the input context of generation, rendering the lack of deep understanding of the relevance between the generated questions and input contexts. As a result, they may wrongly penalize a legitimate and reasonable candidate question when it (i) involves complicated reasoning with the context or (ii) can be grounded by multiple evidences in the context. In this paper, we propose $\textbf{QRelScore}$, a context-aware $\underline{\textbf{Rel}}$evance evaluation metric for $\underline{\textbf{Q}}$uestion Generation. Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation to cope with the complicated reasoning and diverse generation from multiple evidences, respectively. Compared with existing metrics, our experiments demonstrate that QRelScore is able to achieve a higher correlation with human judgments while being much more robust to adversarial samples.
CLMar 5, 2022
Feeding What You Need by Understanding What You LearnedXiaoqiang Wang, Bang Liu, Fangli Xu et al. · mila
Machine Reading Comprehension (MRC) reveals the ability to understand a given text passage and answer questions based on it. Existing research works in MRC rely heavily on large-size models and corpus to improve the performance evaluated by metrics such as Exact Match ($EM$) and $F_1$. However, such a paradigm lacks sufficient interpretation to model capability and can not efficiently train a model with a large corpus. In this paper, we argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data based on its learning status. Specifically, we design an MRC capability assessment framework that assesses model capabilities in an explainable and multi-dimensional manner. Based on it, we further uncover and disentangle the connections between various data properties and model performance. Finally, to verify the effectiveness of the proposed MRC capability assessment framework, we incorporate it into a curriculum learning pipeline and devise a Capability Boundary Breakthrough Curriculum (CBBC) strategy, which performs a model capability-based training to maximize the data value and improve training efficiency. Extensive experiments demonstrate that our approach significantly improves performance, achieving up to an 11.22% / 8.71% improvement of $EM$ / $F_1$ on MRC tasks.
99.2ITJun 1
Four constructions of self-dual binary cyclic codes with a lower bound on the minimum distances better than the square-root boundXiaoqiang Wang, Xun Song, Dabin Zheng et al.
In spite of the intensive study of cyclic codes and the recent construction of an infinite family of self-dual binary cyclic codes whose minimum distances have the square-root bound in IEEE Trans. IT, vol. 71, no. 4, 2025, it is still a 70-year-old open problem whether there is an infinite family of self-dual binary cyclic codes whose minimum distances have a lower bound better than the square-root bound. This paper settles this long-standing open problem in coding theory by presenting infinite families of such self-dual binary cyclic codes. As by-products, several families of cyclic codes with better parameters than those in some references are also constructed in this paper.
CLMar 2, 2022
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition SystemsXiaoqiang Wang, Yanqing Liu, Jinyu Li et al.
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, music list, proper nouns, etc. Existing methods mainly include contextual LM biasing and adding bias encoder into end-to-end ASR models. In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We incorporate contextual information into a sequence-to-sequence spelling correction model with a shared context encoder. Our proposed model includes two different mechanisms: autoregressive (AR) and non-autoregressive (NAR). We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. We demonstrate the proposed model is a general biasing solution which is domain-insensitive and can be adopted in different scenarios. Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods. Compared to the AR solution, the proposed NAR model reduces model size by 43.2% and speeds up inference by 2.1 times.
SDFeb 22, 2023
Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data AugmentationXiaoqiang Wang, Yanqing Liu, Jinyu Li et al.
We previously proposed contextual spelling correction (CSC) to correct the output of end-to-end (E2E) automatic speech recognition (ASR) models with contextual information such as name, place, etc. Although CSC has achieved reasonable improvement in the biasing problem, there are still two drawbacks for further accuracy improvement. First, due to information limitation in text only hypothesis or weak performance of ASR model on rare domains, the CSC model may fail to correct phrases with similar pronunciation or anti-context cases where all biasing phrases are not present in the utterance. Second, there is a discrepancy between the training and inference of CSC. The bias list in training is randomly selected but in inference there may be more similarity between ground truth phrase and other phrases. To solve above limitations, in this paper we propose an improved non-autoregressive (NAR) spelling correction model for contextual biasing in E2E neural transducer-based ASR systems to improve the previous CSC model from two perspectives: Firstly, we incorporate acoustics information with an external attention as well as text hypotheses into CSC to better distinguish target phrase from dissimilar or irrelevant phrases. Secondly, we design a semantic aware data augmentation schema in training phrase to reduce the mismatch between training and inference to further boost the biasing accuracy. Experiments show that the improved method outperforms the baseline ASR+Biasing system by as much as 20.3% relative name recall gain and achieves stable improvement compared to the previous CSC method over different bias list name coverage ratio.
87.7AIMay 22
Foundation Protocol: A Coordination Layer for Agentic SocietyBang Liu, Yongfeng Gu, Jiayi Zhang et al.
Autonomous agents are moving from tools into a layer of social infrastructure: they browse, purchase, deploy software, manage systems, and increasingly interact with one another. As these systems scale, the bottleneck shifts away from raw model capability toward coordination. Agents need to form reliable relationships, organize multi-agent work, exchange value, support an AI economy, and stay safe and accountable under real-world oversight. This paper introduces the Foundation Protocol (FP), a graph-first coordination layer for an emerging human-AI society. FP unifies heterogeneous entities, including agents, tools, resources, humans, institutions, and organizations, and supports native multi-party organization and event-based collaboration. It also provides economic primitives for metering, receipts, and settlement, and treats policy, provenance, and audit as first-class concerns. FP is designed to wrap and bridge existing protocols rather than replace them, enabling incremental adoption while reducing integration and governance overhead. The aim is to keep autonomous agency composable while keeping accountability non-negotiable, so that coordination itself can become shared infrastructure for a human-AI society that is open, pluralistic, and governable.
AIOct 24, 2024Code
OSCAR: Operating System Control via State-Aware Reasoning and Re-PlanningXiaoqiang Wang, Bang Liu · mila
Large language models (LLMs) and large multimodal models (LMMs) have shown great potential in automating complex tasks like web browsing and gaming. However, their ability to generalize across diverse applications remains limited, hindering broader utility. To address this challenge, we present OSCAR: Operating System Control via state-Aware reasoning and Re-planning. OSCAR is a generalist agent designed to autonomously navigate and interact with various desktop and mobile applications through standardized controls, such as mouse and keyboard inputs, while processing screen images to fulfill user commands. OSCAR translates human instructions into executable Python code, enabling precise control over graphical user interfaces (GUIs). To enhance stability and adaptability, OSCAR operates as a state machine, equipped with error-handling mechanisms and dynamic task re-planning, allowing it to efficiently adjust to real-time feedback and exceptions. We demonstrate OSCAR's effectiveness through extensive experiments on diverse benchmarks across desktop and mobile platforms, where it transforms complex workflows into simple natural language commands, significantly boosting user productivity. Our code will be open-source upon publication.
97.6CLMay 20
Mem-$π$: Adaptive Memory through Learning When and What to GenerateXiaoqiang Wang, Chao Wang, Hadi Nekoei et al.
We present Mem-$π$, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill libraries, returning static entries that often misalign with the current context. In contrast, Mem-$π$ uses a dedicated language or vision-language model with its own parameters, separate from the downstream agent, to generate context-specific guidance for complex tasks. Conditioned on the current agent context, the model jointly decides when to produce guidance and what guidance to produce. We train it with a decision-content decoupled reinforcement learning (RL) objective, enabling it to abstain when generation would not help and otherwise produce concise, useful guidance. Across diverse agentic benchmarks spanning web navigation, terminal-based tool use, and text-based embodied interaction, Mem-$π$ consistently outperforms retrieval-based and prior RL-optimized memory baselines, achieving over 30% relative improvement on web navigation tasks.
73.6ITMar 31
A Structural Characterization of Cyclotomic Cosets with Applications to Affine-Invariant Codes and BCH CodesXiongkun Zheng, Dabin Zheng, Xiaoqiang Wang et al.
Affine-invariant codes have attracted considerable attention due to their rich algebraic structure and strong theoretical properties. In this paper, we study a family of affine-invariant codes whose defining set consists of all descendants of elements in the cyclotomic coset of a single specified element. Our main contributions are as follows. First, we establish a new combinatorial result that determines exactly the size of such descendant sets, which is of independent interest in the study of cyclotomic cosets. Second, using this result, we derive explicit formulas for the dimensions of the corresponding affine-invariant codes and their associated cyclic codes, and we establish lower bounds on the minimum distances of their duals. In particular, under appropriate parameter choices, these codes yield narrow-sense primitive BCH codes and their extended counterparts. For the special class of narrow-sense primitive BCH codes with designed distance $δ= (b+1)q^{m-t-1}$, where $1 \leq b \leq q-1$ and $0 \leq t \leq m-1$, we provide exact dimension formulas and an improved lower bound on the minimum distance. The results presented here extend and sharpen several previously known results, and provide refined tools for the parametric analysis of BCH codes and their duals.
AIMar 31, 2025
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe SystemsBang Liu, Xinfeng Li, Jiayi Zhang et al. · microsoft-research
The advent of large language models (LLMs) has catalyzed a transformative shift in artificial intelligence, paving the way for advanced intelligent agents capable of sophisticated reasoning, robust perception, and versatile action across diverse domains. As these agents increasingly drive AI research and practical applications, their design, evaluation, and continuous improvement present intricate, multifaceted challenges. This book provides a comprehensive overview, framing intelligent agents within modular, brain-inspired architectures that integrate principles from cognitive science, neuroscience, and computational research. We structure our exploration into four interconnected parts. First, we systematically investigate the modular foundation of intelligent agents, systematically mapping their cognitive, perceptual, and operational modules onto analogous human brain functionalities and elucidating core components such as memory, world modeling, reward processing, goal, and emotion. Second, we discuss self-enhancement and adaptive evolution mechanisms, exploring how agents autonomously refine their capabilities, adapt to dynamic environments, and achieve continual learning through automated optimization paradigms. Third, we examine multi-agent systems, investigating the collective intelligence emerging from agent interactions, cooperation, and societal structures. Finally, we address the critical imperative of building safe and beneficial AI systems, emphasizing intrinsic and extrinsic security threats, ethical alignment, robustness, and practical mitigation strategies necessary for trustworthy real-world deployment. By synthesizing modular AI architectures with insights from different disciplines, this survey identifies key research challenges and opportunities, encouraging innovations that harmonize technological advancement with meaningful societal benefit.
LGJul 15, 2024
ISMRNN: An Implicitly Segmented RNN Method with Mamba for Long-Term Time Series ForecastingGaoXiang Zhao, Li Zhou, XiaoQiang Wang
Long time series forecasting aims to utilize historical information to forecast future states over extended horizons. Traditional RNN-based series forecasting methods struggle to effectively address long-term dependencies and gradient issues in long time series problems. Recently, SegRNN has emerged as a leading RNN-based model tailored for long-term series forecasting, demonstrating state-of-the-art performance while maintaining a streamlined architecture through innovative segmentation and parallel decoding techniques. Nevertheless, SegRNN has several limitations: its fixed segmentation disrupts data continuity and fails to effectively leverage information across different segments, the segmentation strategy employed by SegRNN does not fundamentally address the issue of information loss within the recurrent structure. To address these issues, we propose the ISMRNN method with three key enhancements: we introduce an implicit segmentation structure to decompose the time series and map it to segmented hidden states, resulting in denser information exchange during the segmentation phase. Additionally, we incorporate residual structures in the encoding layer to mitigate information loss within the recurrent structure. To extract information more effectively, we further integrate the Mamba architecture to enhance time series information extraction. Experiments on several real-world long time series forecasting datasets demonstrate that our model surpasses the performance of current state-of-the-art models.
CLMay 25, 2025
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic ShortcutsXiaoqiang Wang, Suyuchen Wang, Yun Zhu et al.
Chain-of-thought (CoT) reasoning enables large language models (LLMs) to move beyond fast System-1 responses and engage in deliberative System-2 reasoning. However, this comes at the cost of significant inefficiency due to verbose intermediate output. Recent latent-space reasoning methods improve efficiency by operating on hidden states without decoding into language, yet they treat all steps uniformly, failing to distinguish critical deductions from auxiliary steps and resulting in suboptimal use of computational resources. In this paper, we propose System-1.5 Reasoning, an adaptive reasoning framework that dynamically allocates computation across reasoning steps through shortcut paths in latent space. Specifically, System-1.5 Reasoning introduces two types of dynamic shortcuts. The model depth shortcut (DS) adaptively reasons along the vertical depth by early exiting non-critical tokens through lightweight adapter branches, while allowing critical tokens to continue through deeper Transformer layers. The step shortcut (SS) reuses hidden states across the decoding steps to skip trivial steps and reason horizontally in latent space. Training System-1.5 Reasoning involves a two-stage self-distillation process: first distilling natural language CoT into latent-space continuous thought, and then distilling full-path System-2 latent reasoning into adaptive shortcut paths (System-1.5 Reasoning). Experiments on reasoning tasks demonstrate the superior performance of our method. For example, on GSM8K, System-1.5 Reasoning achieves reasoning performance comparable to traditional CoT fine-tuning methods while accelerating inference by over 20x and reducing token generation by 92.31% on average.
LGNov 5, 2025
Distillation-Accelerated Uncertainty Modeling for Multi-Objective RTA InterceptionGaoxiang Zhao, Ruina Qiu, Pengpeng Zhao et al.
Real-Time Auction (RTA) Interception aims to filter out invalid or irrelevant traffic to enhance the integrity and reliability of downstream data. However, two key challenges remain: (i) the need for accurate estimation of traffic quality together with sufficiently high confidence in the model's predictions, typically addressed through uncertainty modeling, and (ii) the efficiency bottlenecks that such uncertainty modeling introduces in real-time applications due to repeated inference. To address these challenges, we propose DAUM, a joint modeling framework that integrates multi-objective learning with uncertainty modeling, yielding both traffic quality predictions and reliable confidence estimates. Building on DAUM, we further apply knowledge distillation to reduce the computational overhead of uncertainty modeling, while largely preserving predictive accuracy and retaining the benefits of uncertainty estimation. Experiments on the JD advertisement dataset demonstrate that DAUM consistently improves predictive performance, with the distilled model delivering a tenfold increase in inference speed.
CLFeb 21, 2025
R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible CompressionXiaoqiang Wang, Suyuchen Wang, Yun Zhu et al.
Memory plays a key role in enhancing LLMs' performance when deployed to real-world applications. Existing solutions face trade-offs: explicit memory designs based on external storage require complex management and incur storage overhead, while implicit memory designs that store information via parameters struggle with reliable retrieval. In this paper, we propose R$^3$Mem, a memory network that optimizes both information Retention and Retrieval through Reversible context compression. Specifically, R$^3$Mem employs virtual memory tokens to compress and encode infinitely long histories, further enhanced by a hierarchical compression strategy that refines information from document- to entity-level for improved assimilation across granularities. For retrieval, R$^3$Mem employs a reversible architecture, reconstructing raw data by invoking the model backward with compressed information. Implemented via parameter-efficient fine-tuning, it can integrate seamlessly with any Transformer-based model. Experiments demonstrate that our memory design achieves state-of-the-art performance in long-context language modeling and retrieval-augmented generation tasks. It also significantly outperforms conventional memory modules in long-horizon interaction tasks like conversational agents, showcasing its potential for next-generation retrieval systems.
LGJun 16, 2025
Accelerating PDE-Constrained Optimization by the Derivative of Neural OperatorsZe Cheng, Zhuoyu Li, Xiaoqiang Wang et al.
PDE-Constrained Optimization (PDECO) problems can be accelerated significantly by employing gradient-based methods with surrogate models like neural operators compared to traditional numerical solvers. However, this approach faces two key challenges: (1) **Data inefficiency**: Lack of efficient data sampling and effective training for neural operators, particularly for optimization purpose. (2) **Instability**: High risk of optimization derailment due to inaccurate neural operator predictions and gradients. To address these challenges, we propose a novel framework: (1) **Optimization-oriented training**: we leverage data from full steps of traditional optimization algorithms and employ a specialized training method for neural operators. (2) **Enhanced derivative learning**: We introduce a *Virtual-Fourier* layer to enhance derivative learning within the neural operator, a crucial aspect for gradient-based optimization. (3) **Hybrid optimization**: We implement a hybrid approach that integrates neural operators with numerical solvers, providing robust regularization for the optimization process. Our extensive experimental results demonstrate the effectiveness of our model in accurately learning operators and their derivatives. Furthermore, our hybrid optimization approach exhibits robust convergence.
LGDec 30, 2024
AverageTime: Enhance Long-Term Time Series Forecasting with Simple AveragingGaoxiang Zhao, Li Zhou, Xiaoqiang Wang
Long-term time series forecasting focuses on leveraging historical data to predict future trends. The core challenge lies in effectively modeling dependencies both within sequences and channels. Convolutional Neural Networks and Linear models often excel in sequence modeling but frequently fall short in capturing complex channel dependencies. In contrast, Transformer-based models, with their attention mechanisms applied to both sequences and channels, have demonstrated strong predictive performance. Our research proposes a new approach for capturing sequence and channel dependencies: AverageTime, an exceptionally simple yet effective structure. By employing mixed channel embedding and averaging operations, AverageTime separately captures correlations for sequences and channels through channel mapping and result averaging. In addition, we integrate clustering methods to further accelerate the model's training process. Experiments on real-world datasets demonstrate that AverageTime surpasses state-of-the-art models in predictive performance while maintaining efficiency comparable to lightweight linear models. This provides a new and effective framework for modeling long time series.
CLFeb 29, 2024
FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and CognitionXiaoqiang Wang, Lingfei Wu, Tengfei Ma et al. · mila
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks. However, such a paradigm fails to comprehensively differentiate the fine-grained language and cognitive skills, rendering the lack of sufficient interpretation to LLMs' capabilities. In this paper, we present FAC$^2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation. Specifically, we formulate LLMs' evaluation in a multi-dimensional and explainable manner by dissociating the language-related capabilities and the cognition-related ones. Besides, through extracting the intermediate reasoning from LLMs, we further break down the process of applying a specific capability into three sub-steps: recalling relevant knowledge, utilizing knowledge, and solving problems. Finally, FAC$^2$E evaluates each sub-step of each fine-grained capability, providing a two-faceted diagnosis for LLMs. Utilizing FAC$^2$E, we identify a common shortfall in knowledge utilization among models and propose a straightforward, knowledge-enhanced method to mitigate this issue. Our results not only showcase promising performance enhancements but also highlight a direction for future LLM advancements.
CLMay 8, 2023
SkillQG: Learning to Generate Question for Reading Comprehension AssessmentXiaoqiang Wang, Bang Liu, Siliang Tang et al.
We present $\textbf{$\texttt{SkillQG}$}$: a question generation framework with controllable comprehension types for assessing and improving machine reading comprehension models. Existing question generation systems widely differentiate questions by $\textit{literal}$ information such as question words and answer types to generate semantically relevant questions for a given context. However, they rarely consider the $\textit{comprehension}$ nature of questions, i.e. the different comprehension capabilities embodied by different questions. In comparison, our $\texttt{SkillQG}$ is able to tailor a fine-grained assessment and improvement to the capabilities of question answering models built on it. Specifically, we first frame the comprehension type of questions based on a hierarchical skill-based schema, then formulate $\texttt{SkillQG}$ as a skill-conditioned question generator. Furthermore, to improve the controllability of generation, we augment the input text with question focus and skill-specific knowledge, which are constructed by iteratively prompting the pre-trained language models. Empirical results demonstrate that $\texttt{SkillQG}$ outperforms baselines in terms of quality, relevance, and skill-controllability while showing a promising performance boost in downstream question answering task.
IVJan 1, 2022
Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB ImagesXiaoqiang Wang, Lei Zhu, Siliang Tang et al.
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection. We first devise a depth decoupling convolutional neural network (DDCNN), which contains a depth estimation branch and a saliency detection branch. The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data. The saliency detection branch is used to fuse the RGB feature and depth feature to predict the RGB-D saliency. Then, the whole DDCNN is assigned as the backbone in a teacher-student framework for semi-supervised learning. Moreover, we also introduce a consistency loss on the intermediate attention and saliency maps for the unlabeled data, as well as a supervised depth and saliency loss for labeled data. Experimental results on seven widely-used benchmark datasets demonstrate that our DDCNN outperforms state-of-the-art methods both quantitatively and qualitatively. We also demonstrate that our semi-supervised DS-Net can further improve the performance, even when using an RGB image with the pseudo depth map.
CLAug 17, 2021
A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systemsXiaoqiang Wang, Yanqing Liu, Sheng Zhao et al.
It's challenging to customize transducer-based automatic speech recognition (ASR) system with context information which is dynamic and unavailable during model training. In this work, we introduce a light-weight contextual spelling correction model to correct context-related recognition errors in transducer-based ASR systems. We incorporate the context information into the spelling correction model with a shared context encoder and use a filtering algorithm to handle large-size context lists. Experiments show that the model improves baseline ASR model performance with about 50% relative word error rate reduction, which also significantly outperforms the baseline method such as contextual LM biasing. The model also shows excellent performance for out-of-vocabulary terms not seen during training.
LGMay 14, 2021
Ordering-Based Causal Discovery with Reinforcement LearningXiaoqiang Wang, Yali Du, Shengyu Zhu et al.
It is a long-standing question to discover causal relations among a set of variables in many empirical sciences. Recently, Reinforcement Learning (RL) has achieved promising results in causal discovery from observational data. However, searching the space of directed graphs and enforcing acyclicity by implicit penalties tend to be inefficient and restrict the existing RL-based method to small scale problems. In this work, we propose a novel RL-based approach for causal discovery, by incorporating RL into the ordering-based paradigm. Specifically, we formulate the ordering search problem as a multi-step Markov decision process, implement the ordering generating process with an encoder-decoder architecture, and finally use RL to optimize the proposed model based on the reward mechanisms designed for~each ordering. A generated ordering would then be processed using variable selection to obtain the final causal graph. We analyze the consistency and computational complexity of the proposed method, and empirically show that a pretrained model can be exploited to accelerate training. Experimental results on both synthetic and real data sets shows that the proposed method achieves a much improved performance over existing RL-based method.
QUANT-PHOct 3, 2019
Quantum tensor singular value decomposition with applications to recommendation systemsXiaoqiang Wang, Lejia Gu, Joseph Heung-wing Joseph Lee et al.
In this paper, we present a quantum singular value decomposition algorithm for third-order tensors inspired by the classical algorithm of tensor singular value decomposition (t-svd) and then extend it to order-$p$ tensors. It can be proved that the quantum version of the t-svd for a third-order tensor $\mathcal{A} \in \mathbb{R}^{N\times N \times N}$ achieves the complexity of $\mathcal{O}(N{\rm polylog}(N))$, an exponential speedup compared with its classical counterpart. As an application, we propose a quantum algorithm for recommendation systems which incorporates the contextual situation of users to the personalized recommendation. We provide recommendations varying with contexts by measuring the output quantum state corresponding to an approximation of this user's preferences. This algorithm runs in expected time $\mathcal{O}(N{\rm polylog}(N){\rm poly}(k)),$ if every frontal slice of the preference tensor has a good rank-$k$ approximation. At last, we provide a quantum algorithm for tensor completion based on a different truncation method which is tested to have a good performance in dynamic video completion.
LGAug 10, 2019
Large-Scale Traffic Signal Control Using a Novel Multi-Agent Reinforcement LearningXiaoqiang Wang, Liangjun Ke, Zhimin Qiao et al.
Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multi-Agent Reinforcement Learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this paper, a new MARL, called Cooperative double Q-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q-learning method based on double estimators and the UCB policy, which can eliminate the over-estimation problem existing in traditional independent Q-learning while ensuring exploration. It uses mean field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied on TSC and tested on a multi-traffic signal simulator. According to the results obtained on several traffic scenarios, Co- DQL outperforms several state-of-the-art decentralized MARL algorithms. It can effectively shorten the average waiting time of the vehicles in the whole road system.