Marina Haliem

AI
6papers
154citations
Novelty53%
AI Score43

6 Papers

LGOct 7, 2022
Multi-agent Deep Covering Skill Discovery

Jiayu Chen, Marina Haliem, Tian Lan et al.

The use of skills (a.k.a., options) can greatly accelerate exploration in reinforcement learning, especially when only sparse reward signals are available. While option discovery methods have been proposed for individual agents, in multi-agent reinforcement learning settings, discovering collaborative options that can coordinate the behavior of multiple agents and encourage them to visit the under-explored regions of their joint state space has not been considered. In this case, we propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space. Also, we propose a novel framework to adopt the multi-agent options in the MARL process. In practice, a multi-agent task can usually be divided into some sub-tasks, each of which can be completed by a sub-group of the agents. Therefore, our algorithm framework first leverages an attention mechanism to find collaborative agent sub-groups that would benefit most from coordinated actions. Then, a hierarchical algorithm, namely HA-MSAC, is developed to learn the multi-agent options for each sub-group to complete their sub-tasks first, and then to integrate them through a high-level policy as the solution of the whole task. This hierarchical option construction allows our framework to strike a balance between scalability and effective collaboration among the agents. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher task rewards.

65.7CVMay 19
VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving

Zhefan Xu, Ghassen Jerfel, Marina Haliem et al.

The rapid growth of autonomous driving datasets has enabled the scaling of powerful motion forecasting models. While large-scale pretraining provides strong performance, the standard imitation objective may not fully capture the complex nuances of human driving preferences. Meanwhile, recent advances in vision-language models (VLMs) have demonstrated impressive reasoning and commonsense understanding. Building on these capabilities, this paper presents VL-DPO, a vision-language-guided framework that aligns ego-vehicle motion forecasting models with human preferences. Our approach leverages a VLM as a zero-shot reasoner to automatically generate preference pairs from a pretrained model's rollouts, which are then used to finetune the model via Direct Preference Optimization (DPO). We finetune our models on the Waymo Open End-to-End Driving Dataset (WOD-E2E) and evaluate performance against held-out human preference annotations using rater feedback score (RFS) and average displacement error (ADE). Our experiments confirm that the VLM's trajectory selection is a high-quality proxy for human preference. Our final model, VL-DPO, yields an 11.94% increase in RFS and a 10.01% reduction in ADE over the pretrained model.

AIApr 1, 2021
AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection

Marina Haliem, Vaneet Aggarwal, Bharat Bhargava

This paper introduces an adaptive model-free deep reinforcement approach that can recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling. Deep Reinforcement Learning (RL) suffers from catastrophic forgetting due to being agnostic to the timescale of changes in the distribution of experiences. Although RL algorithms are guaranteed to converge to optimal policies in Markov decision processes (MDPs), this only holds in the presence of static environments. However, this assumption is very restrictive. In many real-world problems like ride-sharing, traffic control, etc., we are dealing with highly dynamic environments, where RL methods yield only sub-optimal decisions. To mitigate this problem in highly dynamic environments, we (1) adopt an online Dirichlet change point detection (ODCP) algorithm to detect the changes in the distribution of experiences, (2) develop a Deep Q Network (DQN) agent that is capable of recognizing diurnal patterns and making informed dispatching decisions according to the changes in the underlying environment. Rather than fixing patterns by time of week, the proposed approach automatically detects that the MDP has changed, and uses the results of the new model. In addition to the adaptation logic in dispatching, this paper also proposes a dynamic, demand-aware vehicle-passenger matching and route planning framework that dynamically generates optimal routes for each vehicle based on online demand, vehicle capacities, and locations. Evaluation on New York City Taxi public dataset shows the effectiveness of our approach in improving the fleet utilization, where less than 50% of the fleet are utilized to serve the demand of up to 90% of the requests, while maximizing profits and minimizing idle times.

LGMar 1, 2021
Decision Making in Monopoly using a Hybrid Deep Reinforcement Learning Approach

Trevor Bonjour, Marina Haliem, Aala Alsalem et al.

Learning to adapt and make real-time informed decisions in a dynamic and complex environment is a challenging problem. Monopoly is a popular strategic board game that requires players to make multiple decisions during the game. Decision-making in Monopoly involves many real-world elements such as strategizing, luck, and modeling of opponent's policies. In this paper, we present novel representations for the state and action space for the full version of Monopoly and define an improved reward function. Using these, we show that our deep reinforcement learning agent can learn winning strategies for Monopoly against different fixed-policy agents. In Monopoly, players can take multiple actions even if it is not their turn to roll the dice. Some of these actions occur more frequently than others, resulting in a skewed distribution that adversely affects the performance of the learning agent. To tackle the non-uniform distribution of actions, we propose a hybrid approach that combines deep reinforcement learning (for frequent but complex decisions) with a fixed policy approach (for infrequent but straightforward decisions). Experimental results show that our hybrid agent outperforms a standard deep reinforcement learning agent by 30% in the number of games won against fixed-policy agents.

AINov 17, 2020
PassGoodPool: Joint Passengers and Goods Fleet Management with Reinforcement Learning aided Pricing, Matching, and Route Planning

Kaushik Manchella, Marina Haliem, Vaneet Aggarwal et al.

The ubiquitous growth of mobility-on-demand services for passenger and goods delivery has brought various challenges and opportunities within the realm of transportation systems. As a result, intelligent transportation systems are being developed to maximize operational profitability, user convenience, and environmental sustainability. The growth of last mile deliveries alongside ridesharing calls for an efficient and cohesive system that transports both passengers and goods. Existing methods address this using static routing methods considering neither the demands of requests nor the transfer of goods between vehicles during route planning. In this paper, we present a dynamic and demand aware fleet management framework for combined goods and passenger transportation that is capable of (1) Involving both passengers and drivers in the decision-making process by allowing drivers to negotiate to a mutually suitable price, and passengers to accept/reject, (2) Matching of goods to vehicles, and the multi-hop transfer of goods, (3) Dynamically generating optimal routes for each vehicle considering demand along their paths, based on the insertion cost which then determines the matching, (4) Dispatching idle vehicles to areas of anticipated high passenger and goods demand using Deep Reinforcement Learning (RL), (5) Allowing for distributed inference at each vehicle while collectively optimizing fleet objectives. Our proposed model is deployable independently within each vehicle as this minimizes computational costs associated with the growth of distributed systems and democratizes decision-making to each individual. Simulations on a variety of vehicle types, goods, and passenger utility functions show the effectiveness of our approach as compared to other methods that do not consider combined load transportation or dynamic multi-hop route planning.

MAOct 5, 2020
A Distributed Model-Free Ride-Sharing Approach for Joint Matching, Pricing, and Dispatching using Deep Reinforcement Learning

Marina Haliem, Ganapathy Mani, Vaneet Aggarwal et al.

Significant development of ride-sharing services presents a plethora of opportunities to transform urban mobility by providing personalized and convenient transportation while ensuring efficiency of large-scale ride pooling. However, a core problem for such services is route planning for each driver to fulfill the dynamically arriving requests while satisfying given constraints. Current models are mostly limited to static routes with only two rides per vehicle (optimally) or three (with heuristics). In this paper, we present a dynamic, demand aware, and pricing-based vehicle-passenger matching and route planning framework that (1) dynamically generates optimal routes for each vehicle based on online demand, pricing associated with each ride, vehicle capacities and locations. This matching algorithm starts greedily and optimizes over time using an insertion operation, (2) involves drivers in the decision-making process by allowing them to propose a different price based on the expected reward for a particular ride as well as the destination locations for future rides, which is influenced by supply-and demand computed by the Deep Q-network, (3) allows customers to accept or reject rides based on their set of preferences with respect to pricing and delay windows, vehicle type and carpooling preferences, and (4) based on demand prediction, our approach re-balances idle vehicles by dispatching them to the areas of anticipated high demand using deep Reinforcement Learning (RL). Our framework is validated using the New York City Taxi public dataset; however, we consider different vehicle types and designed customer utility functions to validate the setup and study different settings. Experimental results show the effectiveness of our approach in real-time and large scale settings.