Dongyang Xu

CV
h-index12
13papers
184citations
Novelty53%
AI Score53

13 Papers

LGMay 28Code
On-Policy Replay for Continual Supervised Fine-Tuning

Yan Chen, Taojie Zhu, Meng Zhang et al.

Continual supervised fine-tuning (SFT) is the de facto recipe for adapting large language models (LLMs) to a stream of downstream tasks, but it suffers from catastrophic forgetting of earlier capabilities. Recent work shows that on-policy signals -- training on the model's own outputs -- reduce forgetting more reliably than off-policy supervision. Existing on-policy methods route this signal through a new training objective (e.g., self-distillation losses with a teacher copy), inheriting an extra forward pass, schedule sensitivity, and stylistic drift from the teacher.We instead route the on-policy signal through the training data source. Our method, On-Policy Replay (OPR), rolls out the most recent checkpoint on a small budget of historical prompts, filters the generations by a task reward, and replays the surviving (prompt, model response) pairs as ordinary SFT examples. There is no teacher, no auxiliary loss, and no on-the-fly distillation. Across three 7--8B instruction-tuned backbones (Qwen2.5-7B-Instruct, Qwen3-8B, Llama3.1-8B-Instruct) on the TRACE continual-learning benchmark, OPR consistently reduces forgetting; on the sharpest stress test (Qwen2.5-7B-Instruct, Sequential SFT BWT -13.93), OPR lifts BWT to -0.65 at a 10% replay budget and to -2.29 at a 1% budget -- a 46% reduction in |BWT| over a tuned Vanilla Replay baseline, with 42--46% reductions observed across all three backbones. We give a KL-shrinkage interpretation that places OPR and prior on-policy distillation methods on a single axis, and we present a counterintuitive finding that explains why Vanilla Replay is already a strong baseline: low-score replay is uniformly worse than Vanilla Replay, demonstrating that the active ingredient in OPR is the on-policy distribution, not the response quality alone.Our code is available at https://github.com/Yancey2024/OnPolicyReplay.

AIMay 18Code
TeleCom-Bench: How Far Are Large Language Models from Industrial Telecommunication Applications?

Jieting Xiao, Yun Lin, Huizhen Qiu et al.

While Large Language Models have achieved remarkable integration in various vertical scenarios, their deployment in the telecommunications domain remains exploratory due to the lack of a standardized evaluation framework. Current telecom benchmarks primarily focus on static, foundational knowledge and isolated atomic skills, neglecting the equipment-specific documentation and end-to-end industrial workflows essential for real-world production systems. To bridge this gap, we present TeleCom-Bench, a comprehensive benchmark comprising 12 evaluation sets with 22,678 curated samples, which evaluates LLMs across a synergistic hierarchy: (1) Multi-dimensional Knowledge Comprehension, which integrates telecommunication fundamentals, 3GPP protocols, and 5G network architecture with proprietary product knowledge across wired, core, and wireless networks via knowledge graph-driven synthesis; and (2)End-to-End Knowledge Application, which formalizes six core tasks on authentic trajectories from live network agent workflows, including intent recognition, entity extraction, event verification, tool invocation, root cause analysis, and solution generation-across network optimization and fault maintenance scenarios. Evaluations of eight state-of-the-art LLMs reveal a universal Execution Wall: while models achieve 90% accuracy in linguistic interface tasks such as intent recognition and entity extraction, performance collapses to approximately 30% in procedural execution tasks like solution generation. This capability gap demonstrates that current LLMs function competently as diagnosticians but fail as field engineers. TeleCom-Bench provides standardized diagnostics to precisely pinpoint this deficit, offering actionable guidance for domain-specific alignment toward production-ready telecom agents. The dataset and evaluation code have been released at https://github.com/ZTE-AICloud/TeleCom-Bench.

ROJul 18, 2024
Risk-Aware Vehicle Trajectory Prediction Under Safety-Critical Scenarios

Qingfan Wang, Dongyang Xu, Gaoyuan Kuang et al.

Trajectory prediction is significant for intelligent vehicles to achieve high-level autonomous driving, and a lot of relevant research achievements have been made recently. Despite the rapid development, most existing studies solely focused on normal safe scenarios while largely neglecting safety-critical scenarios, particularly those involving imminent collisions. This oversight may result in autonomous vehicles lacking the essential predictive ability in such situations, posing a significant threat to safety. To tackle these, this paper proposes a risk-aware trajectory prediction framework tailored to safety-critical scenarios. Leveraging distinctive hazardous features, we develop three core risk-aware components. First, we introduce a risk-incorporated scene encoder, which augments conventional encoders with quantitative risk information to achieve risk-aware encoding of hazardous scene contexts. Next, we incorporate endpoint-risk-combined intention queries as prediction priors in the decoder to ensure that the predicted multimodal trajectories cover both various spatial intentions and risk levels. Lastly, an auxiliary risk prediction task is implemented for the ultimate risk-aware prediction. Furthermore, to support model training and performance evaluation, we introduce a safety-critical trajectory prediction dataset and tailored evaluation metrics. We conduct comprehensive evaluations and compare our model with several SOTA models. Results demonstrate the superior performance of our model, with a significant improvement in most metrics. This prediction advancement enables autonomous vehicles to execute correct collision avoidance maneuvers under safety-critical scenarios, eventually enhancing road traffic safety.

LGApr 10Code
Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning

Taojie Zhu, Dongyang Xu, Ding Zou et al.

Post-training paradigms for Large Language Models (LLMs), primarily Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), face a fundamental dilemma: SFT provides stability (low variance) but suffers from high fitting bias, while RL enables exploration (low bias) but grapples with high gradient variance. Existing unified optimization strategies often employ naive loss weighting, overlooking the statistical conflict between these distinct gradient signals. In this paper, we provide a rigorous theoretical analysis of this bias-variance trade-off and propose \textbf{DYPO} (Dynamic Policy Optimization), a unified framework designed to structurally mitigate this conflict. DYPO integrates three core components: (1) a \textit{Group Alignment Loss (GAL)} that leverages intrinsic group dynamics to significantly reduce RL gradient variance; (2) a \textit{Multi-Teacher Distillation} mechanism that corrects SFT fitting bias via diverse reasoning paths; and (3) a \textit{Dynamic Exploitation-Exploration Gating} mechanism that adaptively arbitrates between stable SFT and exploratory RL based on reward feedback. Theoretical analysis confirms that DYPO linearly reduces fitting bias and minimizes overall variance. Extensive experiments demonstrate that DYPO significantly outperforms traditional sequential pipelines, achieving an average improvement of 4.8\% on complex reasoning benchmarks and 13.3\% on out-of-distribution tasks. Our code is publicly available at https://github.com/Tocci-Zhu/DYPO.

CVAug 3, 2024
STDA: Spatio-Temporal Dual-Encoder Network Incorporating Driver Attention to Predict Driver Behaviors Under Safety-Critical Scenarios

Dongyang Xu, Yiran Luo, Tianle Lu et al.

Accurate behavior prediction for vehicles is essential but challenging for autonomous driving. Most existing studies show satisfying performance under regular scenarios, but most neglected safety-critical scenarios. In this study, a spatio-temporal dual-encoder network named STDA for safety-critical scenarios was developed. Considering the exceptional capabilities of human drivers in terms of situational awareness and comprehending risks, driver attention was incorporated into STDA to facilitate swift identification of the critical regions, which is expected to improve both performance and interpretability. STDA contains four parts: the driver attention prediction module, which predicts driver attention; the fusion module designed to fuse the features between driver attention and raw images; the temporary encoder module used to enhance the capability to interpret dynamic scenes; and the behavior prediction module to predict the behavior. The experiment data are used to train and validate the model. The results show that STDA improves the G-mean from 0.659 to 0.719 when incorporating driver attention and adopting a temporal encoder module. In addition, extensive experimentation has been conducted to validate that the proposed module exhibits robust generalization capabilities and can be seamlessly integrated into other mainstream models.

CVJul 24, 2024
AHMF: Adaptive Hybrid-Memory-Fusion Model for Driver Attention Prediction

Dongyang Xu, Qingfan Wang, Ji Ma et al.

Accurate driver attention prediction can serve as a critical reference for intelligent vehicles in understanding traffic scenes and making informed driving decisions. Though existing studies on driver attention prediction improved performance by incorporating advanced saliency detection techniques, they overlooked the opportunity to achieve human-inspired prediction by analyzing driving tasks from a cognitive science perspective. During driving, drivers' working memory and long-term memory play crucial roles in scene comprehension and experience retrieval, respectively. Together, they form situational awareness, facilitating drivers to quickly understand the current traffic situation and make optimal decisions based on past driving experiences. To explicitly integrate these two types of memory, this paper proposes an Adaptive Hybrid-Memory-Fusion (AHMF) driver attention prediction model to achieve more human-like predictions. Specifically, the model first encodes information about specific hazardous stimuli in the current scene to form working memories. Then, it adaptively retrieves similar situational experiences from the long-term memory for final prediction. Utilizing domain adaptation techniques, the model performs parallel training across multiple datasets, thereby enriching the accumulated driving experience within the long-term memory module. Compared to existing models, our model demonstrates significant improvements across various metrics on multiple public datasets, proving the effectiveness of integrating hybrid memories in driver attention prediction.

CVMar 18, 2024
GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection

Ziying Song, Lei Yang, Shaoqing Xu et al.

Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a robust fusion framework called Graph BEV. Addressing errors caused by inaccurate point cloud projection, we introduce a Local Align module that employs neighbor-aware depth features via Graph matching. Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features. Our Graph BEV framework achieves state-of-the-art performance, with an mAP of 70.1\%, surpassing BEV Fusion by 1.6\% on the nuscenes validation set. Importantly, our Graph BEV outperforms BEV Fusion by 8.3\% under conditions with misalignment noise.

CVMar 19, 2024
M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving

Dongyang Xu, Haokun Li, Qingfan Wang et al.

End-to-end autonomous driving has witnessed remarkable progress. However, the extensive deployment of autonomous vehicles has yet to be realized, primarily due to 1) inefficient multi-modal environment perception: how to integrate data from multi-modal sensors more efficiently; 2) non-human-like scene understanding: how to effectively locate and predict critical risky agents in traffic scenarios like an experienced driver. To overcome these challenges, in this paper, we propose a Multi-Modal fusion transformer incorporating Driver Attention (M2DA) for autonomous driving. To better fuse multi-modal data and achieve higher alignment between different modalities, a novel Lidar-Vision-Attention-based Fusion (LVAFusion) module is proposed. By incorporating driver attention, we empower the human-like scene understanding ability to autonomous vehicles to identify crucial areas within complex scenarios precisely and ensure safety. We conduct experiments on the CARLA simulator and achieve state-of-the-art performance with less data in closed-loop benchmarks. Source codes are available at https://anonymous.4open.science/r/M2DA-4772.

SPJan 24, 2021
Quantum Learning Based Nonrandom Superimposed Coding for Secure Wireless Access in 5G URLLC

Dongyang Xu, Pinyi Ren

Secure wireless access in ultra-reliable low-latency communications (URLLC), which is a critical aspect of 5G security, has become increasingly important due to its potential support of grant-free configuration. In grant-free URLLC, precise allocation of different pilot resources to different users that share the same time-frequency resource is essential for the next generation NodeB (gNB) to exactly identify those users under access collision and to maintain precise channel estimation required for reliable data transmission. However, this process easily suffers from attacks on pilots. We in this paper propose a quantum learning based nonrandom superimposed coding method to encode and decode pilots on multidimensional resources, such that the uncertainty of attacks can be learned quickly and eliminated precisely. Particularly, multiuser pilots for uplink access are encoded as distinguishable subcarrier activation patterns (SAPs) and gNB decodes pilots of interest from observed SAPs, a superposition of SAPs from access users, by joint design of attack mode detection and user activity detection though a quantum learning network (QLN). We found that the uncertainty lies in the identification process of codeword digits from the attacker, which can be always modelled as a black-box model, resolved by a quantum learning algorithm and quantum circuit. Novel analytical closed-form expressions of failure probability are derived to characterize the reliability of this URLLC system with short packet transmission. Simulations how that our method can bring ultra-high reliability and low latency despite attacks on pilots.

SPJan 21, 2019
Hierarchical 2-D Feature Coding for Secure Pilot Authentication in Multi-User Multi-Antenna OFDM Systems: A Reliability Bound Contraction Perspective

Dongyang Xu, Pinyi Ren, James A. Ritcey

Due to the publicly known and deterministic characteristic of pilot tones, pilot authentication (PA) in multi-user multi-antenna orthogonal frequency-division multiplexing systems is very susceptible to the jamming/nulling/spoofing behaviors. To solve this, in this paper, we develop a hierarchical 2-D feature (H2DF) coding theory that exploits the hidden pilot signal features, i.e., the energy feature and independence feature, to secure pilot information coding which is applied between legitimate parties through a well-designed five-layer hierarchical coding model to achieve secure multiuser PA (SMPA). The reliability of SMPA is characterized using the identification error probability (IEP) of pilot encoding and decoding with the exact closed-form upper and lower bounds. However, this phenomenon of non-tight bounds brings about the risk of long-term instability in SMPA. Therefore, a reliability bound contraction theory is developed to shrink the bound interval, and practically, this is done by an easy-to-implement technique, namely, codebook partition within the H2DF code. In this process, a tradeoff between the upper and lower bounds of IEP is identified and a problem of optimal upper and lower bound tradeoff is formulated, with the objective of optimizing the cardinality of sub-codebooks such that the upper and lower bounds coincide. Solving this, we finally derive an exact closed-form expression for IEP, which realizes a stable and highly reliable SMPA. Numerical results validate the stability and resilience of H2DF coding in SMPA.

SPJan 21, 2019
Independence-Checking Coding for OFDM Channel Training Authentication: Protocol Design, Security, Stability, and Tradeoff Analysis

Dongyang Xu, Pinyi Ren, James A. Ritcey

In wireless OFDM communications systems, pilot tones, due to their publicly known and deterministic characteristic, suffer significant jamming/nulling/spoofing risks. Thus, the convectional channel training protocol using pilot tones could be attacked and paralyzed, which raises the issue of anti-attack channel training authentication (CTA), i.e., verifying the claims of identities of pilot tones and channel estimation samples. In this paper, we consider one-ring scattering scenarios with large-scale uniform linear arrays (ULA) and develop an independence-checking coding (ICC) theory to build a secure and stable CTA protocol, namely, ICC-based CTA (ICC-CTA) protocol. In this protocol, the pilot tones are not only merely randomized and inserted into subcarriers but also encoded as diversified subcarrier activation patterns (SAPs) simultaneously. Those encoded SAPs, though camouflaged by malicious signals, can be identified and decoded into original pilots for high-accuracy channel impulse response (CIR) estimation. The CTA security is first characterized by the error probability of identifying legitimate CIR estimation samples. The CTA instability is formulated as the function of probability of stably estimating CIR against all available diversified SAPs. A realistic tradeoff between the CTA security and instability under the discretely distributed AoA is identified and an optimally stable tradeoff problem is formulated, with the objective of optimizing the code rate to maximize security while maintaining maximum stability for ever. Solving this, we derive the closed-form expression of optimal code rate. Numerical results finally validate the resilience of proposed ICC-CTA protocol.

CRMar 6, 2018
Optimal Grassmann Manifold Eavesdropping: A Huge Security Disaster for M-1-2 Wiretap Channels

Dongyang Xu, Pinyi Ren, James A. Ritcey

We in this paper introduce an advanced eavesdropper that aims to paralyze the artificial-noise-aided secure communications. We consider the M-1-2 Gaussian MISO wiretap channel, which consists of a M-antenna transmitter, a single-antenna receiver, and a two-antenna eavesdropper. This type of eavesdropper, by adopting an optimal Grassmann manifold (OGM) filtering structure, can reduce the maximum achievable secrecy rate (MASR) to be zero by using only two receive antennas, regardless of the number of antennas at the transmitter. Specifically, the eavesdropper exploits linear filters to serially recover the legitimate information symbols and intends to find the optimal filter that minimizes the meansquare error (MSE) in estimating the symbols. During the process, a convex semidefinite programming (SDP) problem with constraints on the filter matrix can be formulated and solved. Interestingly, the resulted optimal filters constitute a complex Grassmann manifold on the matrix space. Based on the filters, a novel expression of MASR is derived and further verified to be zero under the noiseless environment. Besides this, an achievable variable region (AVR) that induces zero MASR is presented analytically in the noisy case. Numerical results are provided to illustrate the huge disaster in the respect of secrecy rate.

ITJan 23, 2018
Code-Frequency Block Group Coding for Anti-Spoofing Pilot Authentication in Multi-Antenna OFDM Systems

Dongyang Xu, Pinyi Ren, James A. Ritcey et al.

A pilot spoofer can paralyze the channel estimation in multi-user orthogonal frequency-division multiplexing (OFD- M) systems by using the same publicly-known pilot tones as legitimate nodes. This causes the problem of pilot authentication (PA). To solve this, we propose, for a two-user multi-antenna OFDM system, a code-frequency block group (CFBG) coding based PA mechanism. Here multi-user pilot information, after being randomized independently to avoid being spoofed, are converted into activation patterns of subcarrier-block groups on code-frequency domain. Those patterns, though overlapped and interfered mutually in the wireless transmission environment, are qualified to be separated and identified as the original pilots with high accuracy, by exploiting CFBG coding theory and channel characteristic. Particularly, we develop the CFBG code through two steps, i.e., 1) devising an ordered signal detection technique to recognize the number of signals coexisting on each subcarrier block, and encoding each subcarrier block with the detected number; 2) constructing a zero-false-drop (ZFD) code and block detection based (BD) code via k-dimensional Latin hypercubes and integrating those two codes into the CFBG code. This code can bring a desirable pilot separation error probability (SEP), inversely proportional to the number of occupied subcarriers and antennas with a power of k. To apply the code to PA, a scheme of pilot conveying, separation and identification is proposed. Based on this novel PA, a joint channel estimation and identification mechanism is proposed to achieve high-precision channel recovery and simultaneously enhance PA without occupying extra resources. Simulation results verify the effectiveness of our proposed mechanism.