85.3ROApr 14Code
Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative DrivingHaojie Bai, Aimin Li, Ruoyu Yao et al.
Closed-loop cooperative driving requires planners that generate realistic multimodal multi-agent trajectories while improving safety and traffic efficiency. Existing diffusion planners can model multimodal behaviors from demonstrations, but they often exhibit weak scene consistency and remain poorly aligned with closed-loop objectives; meanwhile, stable online post-training in reactive multi-agent environments remains difficult. We present Multi-ORFT, which couples scene-conditioned diffusion pre-training with stable online reinforcement post-training. In pre-training, the planner uses inter-agent self-attention, cross-attention, and AdaLN-Zero-based scene conditioning to improve scene consistency and road adherence of joint trajectories. In post-training, we formulate a two-level MDP that exposes step-wise reverse-kernel likelihoods for online optimization, and combine dense trajectory-level rewards with variance-gated group-relative policy optimization (VG-GRPO) to stabilize training. On the WOMD closed-loop benchmark, Multi-ORFT reduces collision rate from 2.04% to 1.89% and off-road rate from 1.68% to 1.36%, while increasing average speed from 8.36 to 8.61 m/s relative to the pre-trained planner, and it outperforms strong open-source baselines including SMART-large, SMART-tiny-CLSFT, and VBD on the primary safety and efficiency metrics. These results show that coupling scene-consistent denoising with stable online diffusion-policy optimization improves the reliability of closed-loop cooperative driving.
12.5SYMay 26
In-Orbit Intelligence or Ground Offloading? Inference Freshness under Intermittent Satellite ConnectivityAyse Nur Pehlivanoglu, Aimin Li, Elif Uysal
This paper studies how to balance onboard and ground computation under intermittent LEO connectivity for optimized inference freshness. As connectivity varies in time, the system switches among the actions of onboard computation, cached semantic transmission, raw-data offloading, and waiting. We define Age of Inference (AoInf) as the performance metric, where the age resets only upon successful task-valid updates. We formulate long-run average AoInf minimization as a finite-state average-cost semi-Markov decision process whose state captures the ground AoInf, orbital contact phase, cache occupancy, and cache age. We then transform the SMDP into an equivalent average-cost MDP and compute the solution via normalized relative value iteration (RVI). Numerical results indicate that the resulting hybrid policy reduces average AoInf relative to onboard-only and offload-only baselines, while requiring less computational resources on the satellite than the former, and fewer communication resources than the latter.
38.5ITMay 25
Age of Information in Time-Varying Multi-Priority QueuesBurak Karasakal, Aimin Li, Elif Uysal
In networks with intermittent connectivity, such as mobile, aerial, and space systems, maintaining information freshness is complicated by time-varying arrivals, service disruptions, and interactions among traffic classes with different priorities. To capture these effects, we study a multi-priority single-server queue with time-varying arrivals and service rates under intermittent connectivity. Our main result shows that an appropriately selected collection of state-conditioned first moments closes exactly, leading to a finite-dimensional linear time-periodic Ordinary Differential Equation (ODE) system for the mean Age of Information (AoI) and mean Peak Age of Information (PAoI) of each priority class. For periodic arrival and service rates, we define a one-period state map by propagating the ODE over a single period, and use the periodicity condition to formulate the periodic steady state as a fixed point of this map. We then propose a fixed-point iteration algorithm and prove its convergence to the unique periodic steady state (PSS). Numerical results reveal that high-priority traffic can strongly reshape the service process seen by lower-priority classes.
48.6NIMay 18
ASTRA: Asynchronous Age-Aware Satellite Random Access via Mean-Field ControlSayam Chakraborty, Aimin Li, Yigit Ince et al.
Satellite Internet-of-Things (IoT) enables massive status-update services beyond terrestrial coverage, but grant-free uplink access creates a coupled freshness-control problem: increasing repetition and receiver-side diversity improves a device's capture-SIC opportunities, yet the resulting population congestion degrades network-wide freshness. Existing AoI-aware random-access models often rely on slot-synchronous collisions, fixed delivery probabilities, or scalar transmit-or-wait decisions and therefore cannot capture asynchronous satellite uplinks with capture and SIC. This paper develops a PHY-aware mean-field framework, termed ASTRA (Asynchronous Age-Aware Satellite Random Access), for freshness-driven satellite IoT random access. We build an access model that captures asynchronous arrivals, partial overlaps, capture, and SIC while preserving the dependence of delivery success on each device's repetition-diversity action. We then formulate the population interaction as a scalable mean-field MDP in which devices optimize access timing and intensity using only local AoI observations. The resulting system admits a mean-field equilibrium in which individual optimality and endogenous congestion are mutually consistent. We further prove that the optimal equilibrium policy admits an age-threshold structure. Numerical results show that the proposed policy reduces AoI relative to age-independent baselines.