9.1QUANT-PHApr 13Code
Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDA-Q: Performance and Expressiveness AdvantagesVinooth Kulkarni, Jaehyun Lee, Adam Hutchings et al.
Dynamic quantum circuits with mid-circuit measurement and classical feedforward are essential for near-term algorithms such as error mitigation, adaptive phase estimation, and Variational Quantum Eigensolvers (VQE), yet transpiling these programs across frameworks remains challenging due to inconsistent support for control flow and measurement semantics. We present a transpilation pipeline that converts OpenQASM 3.0 programs with classical control structures (conditionals and bounded loops) into optimized CUDA-Q C++ kernels, leveraging CUDA-Q's native mid-circuit measurement and host-language control flow to translate dynamic patterns without static circuit expansion. Our open-source framework is validated on comprehensive test suites derived from IBM Quantum's classical feedforward guide, including conditional reset, if-else branching, multi-bit predicates, and sequential feedforward, and on VQE-style parameterized circuits with runtime parameter optimization. Experiments show that the resulting CUDA-Q kernels reduce circuit depth by avoiding branch duplication, improve execution efficiency via low-latency classical feedback, and enhance code readability by directly mapping OpenQASM 3.0 control structures to C++ control flow, thereby bridging OpenQASM 3.0's portable circuit specification with CUDA-Q's performance-oriented execution model for NISQ-era applications requiring dynamic circuit capabilities.
25.7QUANT-PHApr 13
QuMod: Parallel Quantum Job Scheduling on Modular QPUs using Circuit CuttingVinooth Kulkarni, Aaron Orenstein, Xinpeng Li et al.
The quantum computing community is increasingly positioning quantum processors as accelerators within classical HPC workflows, analogous to GPUs and TPUs. However, many real-world applications require scaling to hundreds or thousands of physical qubits to realize logical qubits via error correction. To reach these scales, hardware vendors employing diverse technologies -- such as trapped ions, photonics, neutral atoms, and superconducting circuits -- are moving beyond single, monolithic QPUs toward modular architectures connected via interconnects. For example, IonQ has proposed photonic links for scaling, while IBM has demonstrated a modular QPU architecture by classically linking two 127-qubit devices. Using dynamic circuits, Bell-pair-based teleportation, and circuit cutting, they have shown how to execute a large quantum circuit that cannot fit on a single QPU. As interest in quantum computing grows, cloud providers must ensure fair and efficient resource allocation for multiple users sharing such modular systems. Classical interconnection of QPUs introduces new scheduling challenges, particularly when multiple jobs execute in parallel. In this work, we develop a multi-programmable scheduler for modular quantum systems that jointly considers qubit mapping, parallel circuit execution, measurement synchronization across subcircuits, and teleportation operations between QPUs using dynamic circuits.
STMar 14, 2023
Improving CNN-base Stock Trading By Considering Data Heterogeneity and BurstKeer Yang, Guanqun Zhang, Chuan Bi et al.
In recent years, there have been quite a few attempts to apply intelligent techniques to financial trading, i.e., constructing automatic and intelligent trading framework based on historical stock price. Due to the unpredictable, uncertainty and volatile nature of financial market, researchers have also resorted to deep learning to construct the intelligent trading framework. In this paper, we propose to use CNN as the core functionality of such framework, because it is able to learn the spatial dependency (i.e., between rows and columns) of the input data. However, different with existing deep learning-based trading frameworks, we develop novel normalization process to prepare the stock data. In particular, we first empirically observe that the stock data is intrinsically heterogeneous and bursty, and then validate the heterogeneity and burst nature of stock data from a statistical perspective. Next, we design the data normalization method in a way such that the data heterogeneity is preserved and bursty events are suppressed. We verify out developed CNN-based trading framework plus our new normalization method on 29 stocks. Experiment results show that our approach can outperform other comparing approaches.
CLSep 16, 2025Code
Empowering LLMs with Parameterized Skills for Adversarial Long-Horizon PlanningSijia Cui, Shuai Xu, Aiyao He et al.
Recent advancements in Large Language Models(LLMs) have led to the development of LLM-based AI agents. A key challenge is the creation of agents that can effectively ground themselves in complex, adversarial long-horizon environments. Existing methods mainly focus on (1) using LLMs as policies to interact with the environment through generating low-level feasible actions, and (2) utilizing LLMs to generate high-level tasks or language guides to stimulate action generation. However, the former struggles to generate reliable actions, while the latter relies heavily on expert experience to translate high-level tasks into specific action sequences. To address these challenges, we introduce the Plan with Language, Act with Parameter (PLAP) planning framework that facilitates the grounding of LLM-based agents in long-horizon environments. The PLAP method comprises three key components: (1) a skill library containing environment-specific parameterized skills, (2) a skill planner powered by LLMs, and (3) a skill executor converting the parameterized skills into executable action sequences. We implement PLAP in MicroRTS, a long-horizon real-time strategy game that provides an unfamiliar and challenging environment for LLMs. The experimental results demonstrate the effectiveness of PLAP. In particular, GPT-4o-driven PLAP in a zero-shot setting outperforms 80% of baseline agents, and Qwen2-72B-driven PLAP, with carefully crafted few-shot examples, surpasses the top-tier scripted agent, CoacAI. Additionally, we design comprehensive evaluation metrics and test 6 closed-source and 2 open-source LLMs within the PLAP framework, ultimately releasing an LLM leaderboard ranking long-horizon skill planning ability. Our code is available at https://github.com/AI-Research-TeamX/PLAP.
CVMay 15, 2025Code
MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot LearningYue Wang, Shuai Xu, Xuelin Zhu et al.
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen state-object combinations by leveraging known combinations. Existing studies basically rely on the cross-modal alignment capabilities of CLIP but tend to overlook its limitations in capturing fine-grained local features, which arise from its architectural and training paradigm. To address this issue, we propose a Multi-Stage Cross-modal Interaction (MSCI) model that effectively explores and utilizes intermediate-layer information from CLIP's visual encoder. Specifically, we design two self-adaptive aggregators to extract local information from low-level visual features and integrate global information from high-level visual features, respectively. These key information are progressively incorporated into textual representations through a stage-by-stage interaction mechanism, significantly enhancing the model's perception capability for fine-grained local visual information. Additionally, MSCI dynamically adjusts the attention weights between global and local visual information based on different combinations, as well as different elements within the same combination, allowing it to flexibly adapt to diverse scenarios. Experiments on three widely used datasets fully validate the effectiveness and superiority of the proposed model. Data and code are available at https://github.com/ltpwy/MSCI.
AIMay 13, 2025Code
Strategy-Augmented Planning for Large Language Models via Opponent ExploitationShuai Xu, Sijia Cui, Yanna Wang et al.
Efficiently modeling and exploiting opponents is a long-standing challenge in adversarial domains. Large Language Models (LLMs) trained on extensive textual data have recently demonstrated outstanding performance in general tasks, introducing new research directions for opponent modeling. Some studies primarily focus on directly using LLMs to generate decisions based on the elaborate prompt context that incorporates opponent descriptions, while these approaches are limited to scenarios where LLMs possess adequate domain expertise. To address that, we introduce a two-stage Strategy-Augmented Planning (SAP) framework that significantly enhances the opponent exploitation capabilities of LLM-based agents by utilizing a critical component, the Strategy Evaluation Network (SEN). Specifically, in the offline stage, we construct an explicit strategy space and subsequently collect strategy-outcome pair data for training the SEN network. During the online phase, SAP dynamically recognizes the opponent's strategies and greedily exploits them by searching best response strategy on the well-trained SEN, finally translating strategy to a course of actions by carefully designed prompts. Experimental results show that SAP exhibits robust generalization capabilities, allowing it to perform effectively not only against previously encountered opponent strategies but also against novel, unseen strategies. In the MicroRTS environment, SAP achieves a $85.35\%$ performance improvement over baseline methods and matches the competitiveness of reinforcement learning approaches against state-of-the-art (SOTA) rule-based AI. Our code is available at https://github.com/hsushuai/SAP.
ROMar 7
SwiftBot: A Decentralized Platform for LLM-Powered Federated Robotic Task ExecutionYueMing Zhang, Shuai Xu, Zhengxiong Li et al.
Federated robotic task execution systems require bridging natural language instructions to distributed robot control while efficiently managing computational resources across heterogeneous edge devices without centralized coordination. Existing approaches face three limitations: rigid hand-coded planners requiring extensive domain engineering, centralized coordination that contradicts federated collaboration as robots scale, and static resource allocation failing to share containers across robots when workloads shift dynamically. We present SwiftBot, a federated task execution platform that integrates LLM-based task decomposition with intelligent container orchestration over a DHT overlay, enabling robots to collaboratively execute tasks without centralized control. SwiftBot achieves 94.3% decomposition accuracy across diverse tasks, reduces task startup latency by 1.5-5.4x and average training latency by 1.4-2.5x, and improves tail latency by 1.2-4.7x under high load through federated warm container migration. Evaluation on multimedia tasks validates that co-designing semantic understanding and federated resource management enables both flexibility and efficiency for robotic task control.
CLAug 21, 2025
Self-Guided Function Calling in Large Language Models via Stepwise Experience RecallSijia Cui, Aiyao He, Shuai Xu et al.
Function calling enables large language models (LLMs) to interact with external systems by leveraging tools and APIs. When faced with multi-step tool usage, LLMs still struggle with tool selection, parameter generation, and tool-chain planning. Existing methods typically rely on manually designing task-specific demonstrations, or retrieving from a curated library. These approaches demand substantial expert effort and prompt engineering becomes increasingly complex and inefficient as tool diversity and task difficulty scale. To address these challenges, we propose a self-guided method, Stepwise Experience Recall (SEER), which performs fine-grained, stepwise retrieval from a continually updated experience pool. Instead of relying on static or manually curated library, SEER incrementally augments the experience pool with past successful trajectories, enabling continuous expansion of the pool and improved model performance over time. Evaluated on the ToolQA benchmark, SEER achieves an average improvement of 6.1% on easy and 4.7% on hard questions. We further test SEER on $τ$-bench, which includes two real-world domains. Powered by Qwen2.5-7B and Qwen2.5-72B models, SEER demonstrates substantial accuracy gains of 7.44% and 23.38%, respectively.
CLFeb 9, 2024
Detection of Opioid Users from Reddit Posts via an Attention-based Bidirectional Recurrent Neural NetworkYuchen Wang, Zhengyu Fang, Wei Du et al.
The opioid epidemic, referring to the growing hospitalizations and deaths because of overdose of opioid usage and addiction, has become a severe health problem in the United States. Many strategies have been developed by the federal and local governments and health communities to combat this crisis. Among them, improving our understanding of the epidemic through better health surveillance is one of the top priorities. In addition to direct testing, machine learning approaches may also allow us to detect opioid users by analyzing data from social media because many opioid users may choose not to do the tests but may share their experiences on social media anonymously. In this paper, we take advantage of recent advances in machine learning, collect and analyze user posts from a popular social network Reddit with the goal to identify opioid users. Posts from more than 1,000 users who have posted on three sub-reddits over a period of one month have been collected. In addition to the ones that contain keywords such as opioid, opiate, or heroin, we have also collected posts that contain slang words of opioid such as black or chocolate. We apply an attention-based bidirectional long short memory model to identify opioid users. Experimental results show that the approaches significantly outperform competitive algorithms in terms of F1-score. Furthermore, the model allows us to extract most informative words, such as opiate, opioid, and black, from posts via the attention layer, which provides more insights on how the machine learning algorithm works in distinguishing drug users from non-drug users.
TRFeb 2, 2024
Learning the Market: Sentiment-Based Ensemble Trading AgentsAndrew Ye, James Xu, Vidyut Veedgav et al.
We propose and study the integration of sentiment analysis and deep reinforcement learning ensemble algorithms for stock trading by evaluating strategies capable of dynamically altering their active agent given the concurrent market environment. In particular, we design a simple-yet-effective method for extracting financial sentiment and combine this with improvements on existing trading agents, resulting in a strategy that effectively considers both qualitative market factors and quantitative stock data. We show that our approach results in a strategy that is profitable, robust, and risk-minimal - outperforming the traditional ensemble strategy as well as single agent algorithms and market metrics. Our findings suggest that the conventional practice of switching and reevaluating agents in ensemble every fixed-number of months is sub-optimal, and that a dynamic sentiment-based framework greatly unlocks additional performance. Furthermore, as we have designed our algorithm with simplicity and efficiency in mind, we hypothesize that the transition of our method from historical evaluation towards real-time trading with live data to be relatively simple.
CLMay 13, 2025
TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure HandlersAiyao He, Sijia Cui, Shuai Xu et al.
Recently, large language models(LLMs) have played an increasingly important role in solving a wide range of NLP tasks, leveraging their capabilities of natural language understanding and generating. Integration with external tools further enhances LLMs' effectiveness, providing more precise, timely, and specialized responses. However, LLMs still encounter difficulties with non-executable actions and improper actions, which are primarily attributed to incorrect parameters. The process of generating parameters by LLMs is confined to the tool level, employing the coarse-grained strategy without considering the different difficulties of various tools. To address this issue, we propose TUMS, a novel framework designed to enhance the tool-use capabilities of LLMs by transforming tool-level processing into parameter-level processing. Specifically, our framework consists of four key components: (1) an intent recognizer that identifies the user's intent to help LLMs better understand the task; (2) a task decomposer that breaks down complex tasks into simpler subtasks, each involving a tool call; (3) a subtask processor equipped with multi-structure handlers to generate accurate parameters; and (4) an executor. Our empirical studies have evidenced the effectiveness and efficiency of the TUMS framework with an average of 19.6\% and 50.6\% improvement separately on easy and hard benchmarks of ToolQA, meanwhile, we demonstrated the key contribution of each part with ablation experiments, offering more insights and stimulating future research on Tool-augmented LLMs.
LGMay 24, 2023
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language ModelZirui Liu, Guanchu Wang, Shaochen Zhong et al.
With the rapid growth in model size, fine-tuning the large pre-trained language model has become increasingly difficult due to its extensive memory usage. Previous works usually focus on reducing the number of trainable parameters in the network. While the model parameters do contribute to memory usage, the primary memory bottleneck during training arises from storing feature maps, also known as activations, as they are crucial for gradient calculation. Notably, neural networks are usually trained using stochastic gradient descent. We argue that in stochastic optimization, models can handle noisy gradients as long as the gradient estimator is unbiased with reasonable variance. Following this motivation, we propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance, which only requires storing the sub-sampled activations for calculating the gradient. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones. By replacing the linear operation with our approximated one in transformers, we can achieve up to 2.7$\times$ peak memory reduction with almost no accuracy drop and enables up to $6.4\times$ larger batch size. Under the same hardware, WTA-CRS enables better down-streaming task performance by applying larger models and/or faster training speed with larger batch sizes.
CVJan 24, 2021
Grad-CAM guided channel-spatial attention module for fine-grained visual classificationShuai Xu, Dongliang Chang, Jiyang Xie et al.
Fine-grained visual classification (FGVC) is becoming an important research field, due to its wide applications and the rapid development of computer vision technologies. The current state-of-the-art (SOTA) methods in the FGVC usually employ attention mechanisms to first capture the semantic parts and then discover their subtle differences between distinct classes. The channel-spatial attention mechanisms, which focus on the discriminative channels and regions simultaneously, have significantly improved the classification performance. However, the existing attention modules are poorly guided since part-based detectors in the FGVC depend on the network learning ability without the supervision of part annotations. As obtaining such part annotations is labor-intensive, some visual localization and explanation methods, such as gradient-weighted class activation mapping (Grad-CAM), can be utilized for supervising the attention mechanism. We propose a Grad-CAM guided channel-spatial attention module for the FGVC, which employs the Grad-CAM to supervise and constrain the attention weights by generating the coarse localization maps. To demonstrate the effectiveness of the proposed method, we conduct comprehensive experiments on three popular FGVC datasets, including CUB-$200$-$2011$, Stanford Cars, and FGVC-Aircraft datasets. The proposed method outperforms the SOTA attention modules in the FGVC task. In addition, visualizations of feature maps also demonstrate the superiority of the proposed method against the SOTA approaches.
LGOct 16, 2020
Quantum-Inspired Classical Algorithm for Principal Component RegressionDaniel Chen, Yekun Xu, Betis Baheri et al.
This paper presents a sublinear classical algorithm for principal component regression. The algorithm uses quantum-inspired linear algebra, an idea developed by Tang. Using this technique, her algorithm for recommendation systems achieved runtime only polynomially slower than its quantum counterpart. Her work was quickly adapted to solve many other problems in sublinear time complexity. In this work, we developed an algorithm for principal component regression that runs in time polylogarithmic to the number of data points, an exponential speed up over the state-of-the-art algorithm, under the mild assumption that the input is given in some data structure that supports a norm-based sampling procedure. This exponential speed up allows for potential applications in much larger data sets.
ETDec 19, 2016
A modified Physarum-inspired model for the user equilibrium traffic assignment problemShuai Xu, Wen Jiang, Yehang Shou
The user equilibrium traffic assignment principle is very important in the traffic assignment problem. Mathematical programming models are designed to solve the user equilibrium problem in traditional algorithms. Recently, the Physarum shows the ability to address the user equilibrium and system optimization traffic assignment problems. However, the Physarum model are not efficient in real traffic networks with two-way traffic characteristics and multiple origin-destination pairs. In this article, a modified Physarum-inspired model for the user equilibrium problem is proposed. By decomposing traffic flux based on origin nodes, the traffic flux from different origin-destination pairs can be distinguished in the proposed model. The Physarum can obtain the equilibrium traffic flux when no shorter path can be discovered between each origin-destination pair. Finally, numerical examples use the Sioux Falls network to demonstrate the rationality and convergence properties of the proposed model.