Changling Li

LG
h-index6
7papers
30citations
Novelty50%
AI Score50

7 Papers

LGJul 4, 2024Code
ROER: Regularized Optimal Experience Replay

Changling Li, Zhang-Wei Hong, Pulkit Agrawal et al.

Experience replay serves as a key component in the success of online reinforcement learning (RL). Prioritized experience replay (PER) reweights experiences by the temporal difference (TD) error empirically enhancing the performance. However, few works have explored the motivation of using TD error. In this work, we provide an alternative perspective on TD-error-based reweighting. We show the connections between the experience prioritization and occupancy optimization. By using a regularized RL objective with $f-$divergence regularizer and employing its dual form, we show that an optimal solution to the objective is obtained by shifting the distribution of off-policy data in the replay buffer towards the on-policy optimal distribution using TD-error-based occupancy ratios. Our derivation results in a new pipeline of TD error prioritization. We specifically explore the KL divergence as the regularizer and obtain a new form of prioritization scheme, the regularized optimal experience replay (ROER). We evaluate the proposed prioritization scheme with the Soft Actor-Critic (SAC) algorithm in continuous control MuJoCo and DM Control benchmark tasks where our proposed scheme outperforms baselines in 6 out of 11 tasks while the results of the rest match with or do not deviate far from the baselines. Further, using pretraining, ROER achieves noticeable improvement on difficult Antmaze environment where baselines fail, showing applicability to offline-to-online fine-tuning. Code is available at \url{https://github.com/XavierChanglingLi/Regularized-Optimal-Experience-Replay}.

LGJul 16, 2022
Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System

Mingjie Li, Rui Liu, Guangsi Shi et al.

Long-term time-series forecasting (LTSF) is fundamental to various real-world applications, where Transformer-based models have become the dominant framework due to their ability to capture long-range dependencies. However, these models often experience overfitting due to data redundancy in rolling forecasting settings, limiting their generalization ability particularly evident in longer sequences with highly similar adjacent data. In this work, we introduce CLMFormer, a novel framework that mitigates redundancy through curriculum learning and a memory-driven decoder. Specifically, we progressively introduce Bernoulli noise to the training samples, which effectively breaks the high similarity between adjacent data points. This curriculum-driven noise introduction aids the memory-driven decoder by supplying more diverse and representative training data, enhancing the decoder's ability to model seasonal tendencies and dependencies in the time-series data. To further enhance forecasting accuracy, we introduce a memory-driven decoder. This component enables the model to capture seasonal tendencies and dependencies in the time-series data and leverages temporal relationships to facilitate the forecasting process. Extensive experiments on six real-world LTSF benchmarks show that CLMFormer consistently improves Transformer-based models by up to 30%, demonstrating its effectiveness in long-horizon forecasting.

NIMay 24
Scaling up Energy-Aware Multi-Agent Reinforcement Learning for Mission-Oriented Drone Networks with Individual Reward

Changling Li, Ying Li

Multi-agent reinforcement learning (MARL) has shown wide applicability in collaborative systems such as autonomous driving and smart cities for its ability of learning through interaction. With the recent development of drone networks, researchers have also applied MARL to address the trajectory planning problems. However, the dynamic environment and the limited battery capacity are still challenging for using MARL to achieve efficient collaborative task execution. In this paper, we propose an energy-aware MARL model as an attempt to tackle these challenges, leveraging Deep Q-Networks (DQN) with \emph{individual reward functions} driven by the task execution progress and the remaining battery of drones. We conduct a set of simulation studies for the proposed mode and compare it with the shared reward MARL~\cite{Li2022MARL} to explore the impact of credit assignment in MARL. The results indicate that our proposed model can achieve at least 80\% success rate regardless of the task locations and lengths. Similar to the shared reward mode, the individual reward mode can achieve a better success rate when the task density is high, and it can hit nearly a 100\% success rate when task density gets close to 40\%. The true advantage of our proposed model with individual reward is revealed when scaling up the environment. The comparison to the shared reward MARL shows that the our proposed model is more robust towards the change of the environment size and agent numbers. It can achieve higher success rate with fewer steps due to the clarity of the goal which improves energy efficiency even better.

LGMay 21
Decomposing and Measuring Evaluation Awareness

Changling Li, Terry Jingchen Zhang, Jie Zhang et al.

Frontier language models sometimes recognize that they are being evaluated and adjust their behavior, undermining validity of benchmark results. Yet the field studies it without a shared foundation, conflating properties of the evaluation with properties of the model, and detection with behavioral response. We ground evaluation awareness in social psychology, decomposing it into an environment component (how recognizable the task is) and a model component that separates recognition from propensity to act on it. We operationalize the environment component through eight categorized trigger factors, such as placeholder entities and grading-style output formats, and study recognition and behavior through chain-of-thought monitoring. Across nine frontier models and four benchmarks, recognition rates depend on the specific pairing of model and benchmark rather than on either in isolation. Recognition rarely leads to behavioral change, and when it does, the direction depends on the type of evaluation perceived. Models are also more sensitive to safety than capability evaluations, placing safety benchmark validity at greater risk. To study which factors each model is sensitive to and how they interact, we propose \textbf{EvalAwareBench}, a factor-controlled benchmark of 100 paired safety-capability tasks where each of the eight factors can be independently toggled, varying evaluative signals while holding the underlying request fixed. Through EvalAwareBench, we find that no single factor uniformly affects all models, but stacking factors progressively raises evaluation awareness across all of them. Our framework and EvalAwareBench provide the tools to measure, attribute, and mitigate evaluation awareness, pointing to behavioral consistency under recognition as a promising path forward.

CLSep 9, 2024
ClarQ-LLM: A Benchmark for Models Clarifying and Requesting Information in Task-Oriented Dialog

Yujian Gan, Changling Li, Jinxia Xie et al.

We introduce ClarQ-LLM, an evaluation framework consisting of bilingual English-Chinese conversation tasks, conversational agents and evaluation metrics, designed to serve as a strong benchmark for assessing agents' ability to ask clarification questions in task-oriented dialogues. The benchmark includes 31 different task types, each with 10 unique dialogue scenarios between information seeker and provider agents. The scenarios require the seeker to ask questions to resolve uncertainty and gather necessary information to complete tasks. Unlike traditional benchmarks that evaluate agents based on fixed dialogue content, ClarQ-LLM includes a provider conversational agent to replicate the original human provider in the benchmark. This allows both current and future seeker agents to test their ability to complete information gathering tasks through dialogue by directly interacting with our provider agent. In tests, LLAMA3.1 405B seeker agent managed a maximum success rate of only 60.05\%, showing that ClarQ-LLM presents a strong challenge for future research.

CLJun 28, 2025Code
Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models

Younwoo Choi, Changling Li, Yongjin Yang et al.

As large language models (LLMs) are increasingly integrated into multi-agent and human-AI systems, understanding their awareness of both self-context and conversational partners is essential for ensuring reliable performance and robust safety. While prior work has extensively studied situational awareness which refers to an LLM's ability to recognize its operating phase and constraints, it has largely overlooked the complementary capacity to identify and adapt to the identity and characteristics of a dialogue partner. In this paper, we formalize this latter capability as interlocutor awareness and present the first systematic evaluation of its emergence in contemporary LLMs. We examine interlocutor inference across three dimensions-reasoning patterns, linguistic style, and alignment preferences-and show that LLMs reliably identify same-family peers and certain prominent model families, such as GPT and Claude. To demonstrate its practical significance, we develop three case studies in which interlocutor awareness both enhances multi-LLM collaboration through prompt adaptation and introduces new alignment and safety vulnerabilities, including reward-hacking behaviors and increased jailbreak susceptibility. Our findings highlight the dual promise and peril of identity-sensitive behavior in LLMs, underscoring the need for further understanding of interlocutor awareness and new safeguards in multi-agent deployments. Our code is open-sourced at https://github.com/younwoochoi/InterlocutorAwarenessLLM.

NIOct 29, 2024
Energy-Aware Multi-Agent Reinforcement Learning for Collaborative Execution in Mission-Oriented Drone Networks

Ying Li, Changling Li, Jiyao Chen et al.

Mission-oriented drone networks have been widely used for structural inspection, disaster monitoring, border surveillance, etc. Due to the limited battery capacity of drones, mission execution strategy impacts network performance and mission completion. However, collaborative execution is a challenging problem for drones in such a dynamic environment as it also involves efficient trajectory design. We leverage multi-agent reinforcement learning (MARL) to manage the challenge in this study, letting each drone learn to collaboratively execute tasks and plan trajectories based on its current status and environment. Simulation results show that the proposed collaborative execution model can successfully complete the mission at least 80% of the time, regardless of task locations and lengths, and can even achieve a 100% success rate when the task density is not way too sparse. To the best of our knowledge, our work is one of the pioneer studies on leveraging MARL on collaborative execution for mission-oriented drone networks; the unique value of this work lies in drone battery level driving our model design.