LG SP MLDec 15, 2023

Risk-Aware Continuous Control with Neural Contextual Bandits

Jose A. Ayala-Romero, Andres Garcia-Saavedra, Xavier Costa-Perez

arXiv:2312.09961v16.64 citationsh-index: 29Has CodeAAAI

Originality Incremental advance

AI Analysis

This addresses the problem of implementing learning solutions in real-world sequential decision-making contexts with critical constraints, such as 5G networks, by incorporating risk awareness, though it is incremental as it builds on existing contextual bandit and actor-critic methods.

The paper tackles the problem of risk-aware decision-making in contextual bandit problems with constraints and continuous action spaces, proposing a framework that balances constraint satisfaction against performance, and demonstrates effectiveness in a real-world 5G network use case with an 8.5% increase in power consumption while consistently meeting reliability targets.

Recent advances in learning techniques have garnered attention for their applicability to a diverse range of real-world sequential decision-making problems. Yet, many practical applications have critical constraints for operation in real environments. Most learning solutions often neglect the risk of failing to meet these constraints, hindering their implementation in real-world contexts. In this paper, we propose a risk-aware decision-making framework for contextual bandit problems, accommodating constraints and continuous action spaces. Our approach employs an actor multi-critic architecture, with each critic characterizing the distribution of performance and constraint metrics. Our framework is designed to cater to various risk levels, effectively balancing constraint satisfaction against performance. To demonstrate the effectiveness of our approach, we first compare it against state-of-the-art baseline methods in a synthetic environment, highlighting the impact of intrinsic environmental noise across different risk configurations. Finally, we evaluate our framework in a real-world use case involving a 5G mobile network where only our approach consistently satisfies the system constraint (a signal processing reliability target) with a small performance toll (8.5% increase in power consumption).

View on arXiv PDF Code

Similar