2.8NIMay 14
Resilience under Uncertainty: Securing 6G through Stochastic Reinstantiation of RAN FunctionsGabriel Almeida, Jacek Kibiłda, Joao F. Santos et al.
The disaggregation of base stations into discrete RAN functions introduces new threats to mobile networks, as failures in one RAN function can trigger cascading failures and interrupt entire function chains, with potential to degrade network performance and disrupt service. In this paper, we propose the first resilience mechanism for disaggregated mobile networks that leverages the adaptive reinstantiation of RAN functions under uncertainty to mitigate disruptions and maintain service continuity in the presence of compromised infrastructure. Our mechanism reacts to cascading failures that disrupt Radio Units (RUs) by reinstantiating Central Units (CUs) and Distributed Units (DUs) in alternative cloud locations, restoring their function chains while accounting for uncertainty in users' locations and wireless channel conditions during the in-failure state. We formulate this recovery process as a two-stage stochastic optimization problem, where reinstantiation and routing decisions are made under uncertainty, and bandwidth allocation decisions are performed after uncertainty is resolved. We solve the problem using a Sample Average Approximation (SAA)-based solution as a tractable, deterministic equivalent problem. We numerically evaluate our approach on a real-world disaggregated mobile network topology across multiple failure scenarios and traffic demand conditions, and our results demonstrate that our solution can achieve up to 80% higher recovery performance compared to conventional resilience mechanisms.
21.3LGApr 21
Rethinking Reinforcement Fine-Tuning in LVLM: Convergence, Reward Decomposition, and GeneralizationCarter Adams, Rafael Oliveira, Gabriel Almeida et al.
Reinforcement fine-tuning with verifiable rewards (RLVR) has emerged as a powerful paradigm for equipping large vision-language models (LVLMs) with agentic capabilities such as tool use and multi-step reasoning. Despite striking empirical successes, most notably Visual Agentic Reinforcement Fine-Tuning (Visual-ARFT), the theoretical underpinnings of this paradigm remain poorly understood. In particular, two critical questions lack rigorous answers: (i)~how does the composite structure of verifiable rewards (format compliance, answer accuracy, tool executability) affect the convergence of Group Relative Policy Optimization (GRPO), and (ii)~why does training on a small set of tool-augmented tasks transfer to out-of-distribution domains? We address these gaps by introducing the \emph{Tool-Augmented Markov Decision Process} (TA-MDP), a formal framework that models multimodal agentic decision-making with bounded-depth tool calls. Within this framework, we establish three main results. First, we prove that GRPO under composite verifiable rewards converges to a first-order stationary point at rate $O(1/\sqrt{T})$ with explicit dependence on the number of reward components and group size (\textbf{Theorem~1}). Second, we derive a \emph{Reward Decomposition Theorem} that bounds the sub-optimality gap between decomposed per-component optimization and joint optimization, providing a precise characterization of when reward decomposition is beneficial (\textbf{Theorem~2}). Third, we establish a PAC-Bayes generalization bound for tool-augmented policies that explains the strong out-of-distribution transfer observed in Visual-ARFT (\textbf{Theorem~3}).