MAMay 13, 2021
Emergent Prosociality in Multi-Agent Games Through GiftingWoodrow Z. Wang, Mark Beliaev, Erdem Bıyık et al.
Coordination is often critical to forming prosocial behaviors -- behaviors that increase the overall sum of rewards received by all agents in a multi-agent game. However, state of the art reinforcement learning algorithms often suffer from converging to socially less desirable equilibria when multiple equilibria exist. Previous works address this challenge with explicit reward shaping, which requires the strong assumption that agents can be forced to be prosocial. We propose using a less restrictive peer-rewarding mechanism, gifting, that guides the agents toward more socially desirable equilibria while allowing agents to remain selfish and decentralized. Gifting allows each agent to give some of their reward to other agents. We employ a theoretical framework that captures the benefit of gifting in converging to the prosocial equilibrium by characterizing the equilibria's basins of attraction in a dynamical system. With gifting, we demonstrate increased convergence of high risk, general-sum coordination games to the prosocial equilibrium both via numerical analysis and experiments.
MAMay 6, 2021
Incentivizing Efficient Equilibria in Traffic Networks with Mixed AutonomyErdem Bıyık, Daniel A. Lazar, Ramtin Pedarsani et al.
Traffic congestion has large economic and social costs. The introduction of autonomous vehicles can potentially reduce this congestion by increasing road capacity via vehicle platooning and by creating an avenue for influencing people's choice of routes. We consider a network of parallel roads with two modes of transportation: (i) human drivers, who will choose the quickest route available to them, and (ii) a ride hailing service, which provides an array of autonomous vehicle route options, each with different prices, to users. We formalize a model of vehicle flow in mixed autonomy and a model of how autonomous service users make choices between routes with different prices and latencies. Developing an algorithm to learn the preferences of the users, we formulate a planning optimization that chooses prices to maximize a social objective. We demonstrate the benefit of the proposed scheme by comparing the results to theoretical benchmarks which we show can be efficiently calculated.
SIDec 28, 2020
Incentivizing Routing Choices for Safe and Efficient Transportation in the Face of the COVID-19 PandemicMark Beliaev, Erdem Bıyık, Daniel A. Lazar et al.
The COVID-19 pandemic has severely affected many aspects of people's daily lives. While many countries are in a re-opening stage, some effects of the pandemic on people's behaviors are expected to last much longer, including how they choose between different transport options. Experts predict considerably delayed recovery of the public transport options, as people try to avoid crowded places. In turn, significant increases in traffic congestion are expected, since people are likely to prefer using their own vehicles or taxis as opposed to riskier and more crowded options such as the railway. In this paper, we propose to use financial incentives to set the tradeoff between risk of infection and congestion to achieve safe and efficient transportation networks. To this end, we formulate a network optimization problem to optimize taxi fares. For our framework to be useful in various cities and times of the day without much designer effort, we also propose a data-driven approach to learn human preferences about transport options, which is then used in our taxi fare optimization. Our user studies and simulation experiments show our framework is able to minimize congestion and risk of infection.
OCSep 9, 2019
Learning How to Dynamically Route Autonomous Vehicles on Shared RoadsDaniel A. Lazar, Erdem Bıyık, Dorsa Sadigh et al.
Road congestion induces significant costs across the world, and road network disturbances, such as traffic accidents, can cause highly congested traffic patterns. If a planner had control over the routing of all vehicles in the network, they could easily reverse this effect. In a more realistic scenario, we consider a planner that controls autonomous cars, which are a fraction of all present cars. We study a dynamic routing game, in which the route choices of autonomous cars can be controlled and the human drivers react selfishly and dynamically. As the problem is prohibitively large, we use deep reinforcement learning to learn a policy for controlling the autonomous vehicles. This policy indirectly influences human drivers to route themselves in such a way that minimizes congestion on the network. To gauge the effectiveness of our learned policies, we establish theoretical results characterizing equilibria and empirically compare the learned policy results with best possible equilibria. We prove properties of equilibria on parallel roads and provide a polynomial-time optimization for computing the most efficient equilibrium. Moreover, we show that in the absence of these policies, high demand and network perturbations would result in large congestion, whereas using the policy greatly decreases the travel times by minimizing the congestion. To the best of our knowledge, this is the first work that employs deep reinforcement learning to reduce congestion by indirectly influencing humans' routing decisions in mixed-autonomy traffic.
OCApr 3, 2019
The Green Choice: Learning and Influencing Human Decisions on Shared RoadsErdem Bıyık, Daniel A. Lazar, Dorsa Sadigh et al.
Autonomous vehicles have the potential to increase the capacity of roads via platooning, even when human drivers and autonomous vehicles share roads. However, when users of a road network choose their routes selfishly, the resulting traffic configuration may be very inefficient. Because of this, we consider how to influence human decisions so as to decrease congestion on these roads. We consider a network of parallel roads with two modes of transportation: (i) human drivers who will choose the quickest route available to them, and (ii) ride hailing service which provides an array of autonomous vehicle ride options, each with different prices, to users. In this work, we seek to design these prices so that when autonomous service users choose from these options and human drivers selfishly choose their resulting routes, road usage is maximized and transit delay is minimized. To do so, we formalize a model of how autonomous service users make choices between routes with different price/delay values. Developing a preference-based algorithm to learn the preferences of the users, and using a vehicle flow model related to the Fundamental Diagram of Traffic, we formulate a planning optimization to maximize a social objective and demonstrate the benefit of the proposed routing and learning scheme.
OCJul 12, 2018
Maximizing Road Capacity Using Cars that Influence PeopleDaniel A. Lazar, Kabir Chandrasekher, Ramtin Pedarsani et al.
The emerging technology enabling autonomy in vehicles has led to a variety of new problems in transportation networks, such as planning and perception for autonomous vehicles. Other works consider social objectives such as decreasing fuel consumption and travel time by platooning. However, these strategies are limited by the actions of the surrounding human drivers. In this paper, we consider proactively achieving these social objectives by influencing human behavior through planned interactions. Our key insight is that we can use these social objectives to design local interactions that influence human behavior to achieve these goals. To this end, we characterize the increase in road capacity afforded by platooning, as well as the vehicle configuration that maximizes road capacity. We present a novel algorithm that uses a low-level control framework to leverage local interactions to optimally rearrange vehicles. We showcase our algorithm using a simulated road shared between autonomous and human-driven vehicles, in which we illustrate the reordering in action.
OCSep 5, 2018
Routing for Traffic Networks with Mixed AutonomyDaniel A. Lazar, Sam Coogan, Ramtin Pedarsani
In this work we propose a macroscopic model for studying routing on networks shared between human-driven and autonomous vehicles that captures the effects of autonomous vehicles forming platoons. We use this to study inefficiency due to selfish routing and bound the Price of Anarchy (PoA), the maximum ratio between total delay experienced by selfish users and the minimum possible total delay. To do so, we establish two road capacity models, each corresponding to an assumption regarding the platooning capabilities of autonomous vehicles. Using these we develop a class of road delay functions, parameterized by the road capacity, that are polynomial with respect to vehicle flow. We then bound the PoA and the bicriteria, another measure of the inefficiency due to selfish routing. We find these bounds depend on: 1) the degree of the polynomial in the road cost function and 2) the degree of asymmetry, the difference in how human-driven and autonomous traffic affect congestion. We demonstrate that these bounds recover the classical bounds when no asymmetry exists. We show the bounds are tight in certain cases and that the PoA bound is order-optimal with respect to the degree of asymmetry.