97.0LGMay 20Code
Dynamic TMoE: A Drift-Aware Dynamic Mixture of Experts Framework for Non-Stationary Time Series ForecastingJiawen Zhu, Shuhan Liu, Di Weng et al.
Non-stationary time series forecasting is challenged by evolving distribution shifts that static models struggle to capture. While Mixture-of-Experts (MoE) architectures offer a promising paradigm for decoupling complex drift patterns, existing approaches are limited by fixed expert pools and memoryless routing, hampering their ability to adapt to abrupt regime shifts. To address this, we propose Dynamic TMoE, a framework that unifies architectural evolution with temporal continuity during learning phase. By detecting distribution shifts via Maximum Mean Discrepancy (MMD), we dynamically instantiate heterogeneous experts and prune redundant ones to optimize capacity. Additionally, a temporal memory router leverages recurrent states and an anomaly repository to ensure stable, context-aware expert selection without requiring test-time updates. Experiments on nine benchmarks demonstrate state-of-the-art performance, reducing MSE by 10.4% and MAE by 7.8%. Code is available at https://github.com/andone-07/Dynamic-TMoE.
89.9SEMar 27Code
A Benchmark for Evaluating Repository-Level Code Agents with Intermediate Reasoning on Feature Addition TaskShuhan Liu, Zhiyi Zhao, Xing Hu et al.
Repository-level code agents have shown strong promise in real-world feature addition tasks, making reliable evaluation of their capabilities increasingly important. However, existing benchmarks primarily evaluate these agents as black boxes based on final test correctness, providing limited insight into how they reason and where failures arise. To address this limitation, we introduce RACE-bench, a reasoning-augmented benchmark for evaluating code agents on repository-level feature addition tasks. RACE-bench contains 528 real-world feature addition instances from 12 open-source repositories. Each instance is paired with executable patch verification and structured intermediate reasoning ground truth covering issue understanding, file localization, implementation tasks, and step decomposition. Based on this design, we introduce a dual-track evaluation framework that jointly measures patch correctness and intermediate reasoning quality. We evaluate three representative repository-level code agents on RACE-bench. On the full benchmark, Resolved Rates range from 29% to 70% across different agents. Our reasoning-level analysis further shows that while current agents perform well at understanding high-level intent, their performance degrades substantially when translating intent into concrete implementation steps. We also find that apply-success but test-fail cases exhibit lower reasoning recall (35.7% decrease) and higher over-prediction (94.1% increase) compared to successful cases. These findings highlight the importance of evaluating repository-level code agents beyond final patch correctness by examining the quality of their reasoning processes.
97.9SPMay 1
Adaptive 3D-RoPE: Physics-Aligned Rotary Positional Encoding for Wireless Foundation ModelsChenyu Zhang, Xinchen Lyu, Chenshan Ren et al.
Positional encoding plays a pivotal role in determin?ing the extrapolation and generalization performance of wireless foundation models for channel state information (CSI) modeling, latent characterization, and task-specific prediction. However, existing CSI models inherit static or one-dimensional positional priors from natural language and vision architectures, which fundamentally misalign with the intrinsic physics of wireless channels by lacking explicit relative decay, collapsing the 3D spatio-temporal-frequency structure, and remaining scenario?rigid. This paper proposes Adaptive 3D-RoPE, a physics-aligned rotary positional encoding that establishes the structural corner?stone for wireless foundation models. The framework integrates a learnable, axis-decoupled 3D frequency bank to explicitly disentangle multi-dimensional phase dependencies, coupled with a lightweight channel-conditioned controller that dynamically modulates the prior via compact global CSI descriptors. This sample-adaptive mechanism transforms positional encoding from a static transformer component into a dynamic, coherence-aware inductive bias to resolve heterogeneous channel physics. Extensive experiments across 100 datasets demonstrate the superiority of the proposed scheme in both scale extrapolation and zero-shot generalization. Compared to the state-of-the-art, our method achieves up to a 10.7 dB reduction in normalized mean square error (NMSE) under 8 times antenna scale extrapolation. Given the same CSI input scales, our method can also improve zero-shot NMSE by 1.07 dB across unseen mobility scenarios and 0.90 dB in low-frequency-to-millimeter-wave tasks.
LGJan 26
HeterCSI: Channel-Adaptive Heterogeneous CSI Pretraining Framework for Generalized Wireless Foundation ModelsChenyu Zhang, Xinchen Lyu, Chenshan Ren et al.
Wireless foundation models promise transformative capabilities for channel state information (CSI) processing across diverse 6G network applications, yet face fundamental challenges due to the inherent dual heterogeneity of CSI across both scale and scenario dimensions. However, current pretraining approaches either constrain inputs to fixed dimensions or isolate training by scale, limiting the generalization and scalability of wireless foundation models. In this paper, we propose HeterCSI, a channel-adaptive pretraining framework that reconciles training efficiency with robust cross-scenario generalization via a new understanding of gradient dynamics in heterogeneous CSI pretraining. Our key insight reveals that CSI scale heterogeneity primarily causes destructive gradient interference, while scenario diversity actually promotes constructive gradient alignment when properly managed. Specifically, we formulate heterogeneous CSI batch construction as a partitioning optimization problem that minimizes zero-padding overhead while preserving scenario diversity. To solve this, we develop a scale-aware adaptive batching strategy that aligns CSI samples of similar scales, and design a double-masking mechanism to isolate valid signals from padding artifacts. Extensive experiments on 12 datasets demonstrate that HeterCSI establishes a generalized foundation model without scenario-specific finetuning, achieving superior average performance over full-shot baselines. Compared to the state-of-the-art zero-shot benchmark WiFo, it reduces NMSE by 7.19 dB, 4.08 dB, and 5.27 dB for CSI reconstruction, time-domain, and frequency-domain prediction, respectively. The proposed HeterCSI framework also reduces training latency by 53% compared to existing approaches while improving generalization performance by 1.53 dB on average.
LGOct 25, 2024Code
A Survey of Deep Graph Learning under Distribution Shifts: from Graph Out-of-Distribution Generalization to AdaptationKexin Zhang, Shuhan Liu, Song Wang et al.
Distribution shifts on graphs -- the discrepancies in data distribution between training and employing a graph machine learning model -- are ubiquitous and often unavoidable in real-world scenarios. These shifts may severely deteriorate model performance, posing significant challenges for reliable graph machine learning. Consequently, there has been a surge in research on graph machine learning under distribution shifts, aiming to train models to achieve satisfactory performance on out-of-distribution (OOD) test data. In our survey, we provide an up-to-date and forward-looking review of deep graph learning under distribution shifts. Specifically, we cover three primary scenarios: graph OOD generalization, training-time graph OOD adaptation, and test-time graph OOD adaptation. We begin by formally formulating the problems and discussing various types of distribution shifts that can affect graph learning, such as covariate shifts and concept shifts. To provide a better understanding of the literature, we introduce a systematic taxonomy that classifies existing methods into model-centric and data-centric approaches, investigating the techniques used in each category. We also summarize commonly used datasets in this research area to facilitate further investigation. Finally, we point out promising research directions and the corresponding challenges to encourage further study in this vital domain. We also provide a continuously updated reading list at https://github.com/kaize0409/Awesome-Graph-OOD.
LGFeb 17, 2024Code
Beyond Generalization: A Survey of Out-Of-Distribution Adaptation on GraphsShuhan Liu, Kaize Ding
Distribution shifts on graphs -- the data distribution discrepancies between training and testing a graph machine learning model, are often ubiquitous and unavoidable in real-world scenarios. Such shifts may severely deteriorate the performance of the model, posing significant challenges for reliable graph machine learning. Consequently, there has been a surge in research on graph Out-Of-Distribution (OOD) adaptation methods that aim to mitigate the distribution shifts and adapt the knowledge from one distribution to another. In our survey, we provide an up-to-date and forward-looking review of graph OOD adaptation methods, covering two main problem scenarios including training-time as well as test-time graph OOD adaptation. We start by formally formulating the two problems and then discuss different types of distribution shifts on graphs. Based on our proposed taxonomy for graph OOD adaptation, we systematically categorize the existing methods according to their learning paradigm and investigate the techniques behind them. Finally, we point out promising research directions and the corresponding challenges. We also provide a continuously updated reading list at https://github.com/kaize0409/Awesome-Graph-OOD-Adaptation.git
SESep 17, 2025
Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A Self-Optimizing FrameworkKerui Huang, Shuhan Liu, Xing Hu et al.
Chain-of-Thought (CoT) reasoning enhances Large Language Models (LLMs) by prompting intermediate steps, improving accuracy and robustness in arithmetic, logic, and commonsense tasks. However, this benefit comes with high computational costs: longer outputs increase latency, memory usage, and KV-cache demands. These issues are especially critical in software engineering tasks where concise and deterministic outputs are required. To investigate these trade-offs, we conduct an empirical study based on code generation benchmarks. The results reveal that longer CoT does not always help. Excessive reasoning often causes truncation, accuracy drops, and latency up to five times higher, with failed outputs consistently longer than successful ones. These findings challenge the assumption that longer reasoning is inherently better and highlight the need for adaptive CoT control. Motivated by this, we propose SEER (Self-Enhancing Efficient Reasoning), an adaptive framework that compresses CoT while preserving accuracy. SEER combines Best-of-N sampling with task-aware adaptive filtering, dynamically adjusting thresholds based on pre-inference outputs to reduce verbosity and computational overhead. We then evaluate SEER on three software engineering tasks and one math task. On average, SEER shortens CoT by 42.1%, improves accuracy by reducing truncation, and eliminates most infinite loops. These results demonstrate SEER as a practical method to make CoT-enhanced LLMs more efficient and robust, even under resource constraints.
LGAug 29, 2025
DLGAN : Time Series Synthesis Based on Dual-Layer Generative Adversarial NetworksXuan Hou, Shuhan Liu, Zhaohui Peng et al.
Time series synthesis is an effective approach to ensuring the secure circulation of time series data. Existing time series synthesis methods typically perform temporal modeling based on random sequences to generate target sequences, which often struggle to ensure the temporal dependencies in the generated time series. Additionally, directly modeling temporal features on random sequences makes it challenging to accurately capture the feature information of the original time series. To address the above issues, we propose a simple but effective generative model \textbf{D}ual-\textbf{L}ayer \textbf{G}enerative \textbf{A}dversarial \textbf{N}etworks, named \textbf{DLGAN}. The model decomposes the time series generation process into two stages: sequence feature extraction and sequence reconstruction. First, these two stages form a complete time series autoencoder, enabling supervised learning on the original time series to ensure that the reconstruction process can restore the temporal dependencies of the sequence. Second, a Generative Adversarial Network (GAN) is used to generate synthetic feature vectors that align with the real-time sequence feature vectors, ensuring that the generator can capture the temporal features from real time series. Extensive experiments on four public datasets demonstrate the superiority of this model across various evaluation metrics.
LGAug 29, 2025
Stage-Diff: Stage-wise Long-Term Time Series Generation Based on Diffusion ModelsXuan Hou, Shuhan Liu, Zhaohui Peng et al.
Generative models have been successfully used in the field of time series generation. However, when dealing with long-term time series, which span over extended periods and exhibit more complex long-term temporal patterns, the task of generation becomes significantly more challenging. Long-term time series exhibit long-range temporal dependencies, but their data distribution also undergoes gradual changes over time. Finding a balance between these long-term dependencies and the drift in data distribution is a key challenge. On the other hand, long-term time series contain more complex interrelationships between different feature sequences, making the task of effectively capturing both intra-sequence and inter-sequence dependencies another important challenge. To address these issues, we propose Stage-Diff, a staged generative model for long-term time series based on diffusion models. First, through stage-wise sequence generation and inter-stage information transfer, the model preserves long-term sequence dependencies while enabling the modeling of data distribution shifts. Second, within each stage, progressive sequence decomposition is applied to perform channel-independent modeling at different time scales, while inter-stage information transfer utilizes multi-channel fusion modeling. This approach combines the robustness of channel-independent modeling with the information fusion advantages of multi-channel modeling, effectively balancing the intra-sequence and inter-sequence dependencies of long-term time series. Extensive experiments on multiple real-world datasets validate the effectiveness of Stage-Diff in long-term time series generation tasks.
HCSep 1, 2020
PlotThread: Creating Expressive Storyline Visualizations using Reinforcement LearningTan Tang, Renzhong Li, Xinke Wu et al.
Storyline visualizations are an effective means to present the evolution of plots and reveal the scenic interactions among characters. However, the design of storyline visualizations is a difficult task as users need to balance between aesthetic goals and narrative constraints. Despite that the optimization-based methods have been improved significantly in terms of producing aesthetic and legible layouts, the existing (semi-) automatic methods are still limited regarding 1) efficient exploration of the storyline design space and 2) flexible customization of storyline layouts. In this work, we propose a reinforcement learning framework to train an AI agent that assists users in exploring the design space efficiently and generating well-optimized storylines. Based on the framework, we introduce PlotThread, an authoring tool that integrates a set of flexible interactions to support easy customization of storyline visualizations. To seamlessly integrate the AI agent into the authoring process, we employ a mixed-initiative approach where both the agent and designers work on the same canvas to boost the collaborative design of storylines. We evaluate the reinforcement learning model through qualitative and quantitative experiments and demonstrate the usage of PlotThread using a collection of use cases.