Nicolas Lanzetti

LG
4papers
31citations
Novelty60%
AI Score45

4 Papers

LGOct 20, 2022
Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions

Antonio Terpin, Nicolas Lanzetti, Batuhan Yardim et al.

Policy Optimization (PO) algorithms have been proven particularly suited to handle the high-dimensionality of real-world continuous control tasks. In this context, Trust Region Policy Optimization methods represent a popular approach to stabilize the policy updates. These usually rely on the Kullback-Leibler (KL) divergence to limit the change in the policy. The Wasserstein distance represents a natural alternative, in place of the KL divergence, to define trust regions or to regularize the objective function. However, state-of-the-art works either resort to its approximations or do not provide an algorithm for continuous state-action spaces, reducing the applicability of the method. In this paper, we explore optimal transport discrepancies (which include the Wasserstein distance) to define trust regions, and we propose a novel algorithm - Optimal Transport Trust Region Policy Optimization (OT-TRPO) - for continuous state-action spaces. We circumvent the infinite-dimensional optimization problem for PO by providing a one-dimensional dual reformulation for which strong duality holds. We then analytically derive the optimal policy update given the solution of the dual problem. This way, we bypass the computation of optimal transport costs and of optimal transport maps, which we implicitly characterize by solving the dual formulation. Finally, we provide an experimental evaluation of our approach across various control tasks. Our results show that optimal transport discrepancies can offer an advantage over state-of-the-art approaches.

79.0OCApr 24
Strategically Robust Linear Quadratic Dynamic Games

Boris Velasevic, Nicolas Lanzetti, Eric Mazumdar

We study linear quadratic dynamic games where players are uncertain about each other's control policies or goals and consequently seek to be strategically robust. Building on recent work on strategically robust and risk-averse game theory, we first formalize the problem of strategically robust linear quadratic dynamic games. We show that these can be rewritten as simple transformations of linear quadratic games in which each player chooses a controller in a fictitious game in which they are faced with an adversary who is penalized for deviating from the other players' policies. This formulation naturally induces a novel notion of dynamic equilibrium, which we call a strategically robust dynamic equilibrium. We establish existence and uniqueness of such equilibria and furthermore show that the equilibrium policies are Markovian, linear, and can be efficiently computed via coupled backward Riccati equations. Through numerical simulations, including experiments in a network game, we illustrate the benefits of strategic robustness in designing robust and resilient decentralized control schemes. Our experiments also expose a "free-lunch" phenomenon in games in which robustness does not incur a corresponding loss in performance but can yield improvements in players' utilities and social welfare.

52.0GTApr 26
Strategically Robust Aggregative Games

Andreas Feik, Nicolas Lanzetti, Saverio Bolognani et al.

In many multiagent settings, such as electric vehicle charging and traffic routing, agents must make decisions in the face of uncertain behavior exhibited by others. Often, this uncertainty arises from multiple sources, such as incomplete information, limited computation, or bounded rationality, ultimately impacting the aggregate behavior. To tackle this challenge, we follow recent work on strategically robust game theory and postulate that agents seek protection directly against deviations around the emergent behavior, as opposed to explicitly modeling all sources of uncertainty. Specifically, we propose that each agent protects itself against the worst-case aggregate behavior within an optimal-transport-based ambiguity set centered at the emergent aggregate population behavior. This leads to a novel equilibrium concept, called strategically robust Wardrop equilibrium, that enables one to interpolate between standard Wardrop equilibria (no robustness) and security strategies (maximum robustness). In the setting of convex aggregative games, we establish the existence of a pure strategically robust Wardrop equilibrium and provide tractable computational tools for computing it. Through an application in electric vehicle charging, we demonstrate that strategically robust Wardrop equilibria lead to better decisions, protecting agents against the uncertain aggregate behavior of the population. Remarkably, we also observe that strategic robustness can lead to lower equilibrium costs for all agents, uncovering a "coordination-via-robustification" effect.

LGJun 18, 2024
Learning diffusion at lightspeed

Antonio Terpin, Nicolas Lanzetti, Martin Gadea et al.

Diffusion regulates numerous natural processes and the dynamics of many successful generative models. Existing models to learn the diffusion terms from observational data rely on complex bilevel optimization problems and model only the drift of the system. We propose a new simple model, JKOnet*, which bypasses the complexity of existing architectures while presenting significantly enhanced representational capabilities: JKOnet* recovers the potential, interaction, and internal energy components of the underlying diffusion process. JKOnet* minimizes a simple quadratic loss and outperforms other baselines in terms of sample efficiency, computational complexity, and accuracy. Additionally, JKOnet* provides a closed-form optimal solution for linearly parametrized functionals, and, when applied to predict the evolution of cellular processes from real-world data, it achieves state-of-the-art accuracy at a fraction of the computational cost of all existing methods. Our methodology is based on the interpretation of diffusion processes as energy-minimizing trajectories in the probability space via the so-called JKO scheme, which we study via its first-order optimality conditions.