Aparna Taneja

LG
h-index33
25papers
240citations
Novelty50%
AI Score49

25 Papers

LGOct 23, 2023
Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization

Yunfan Zhao, Nikhil Behari, Edward Hughes et al.

Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast generalization, we learn a novel single policy network model that utilizes feature information and employs a training procedure in which arms opt-in and out over time. We derive a new update rule for a crucial $λ$-network with theoretical convergence guarantees and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems.

LGAug 11, 2024
The Bandit Whisperer: Communication Learning for Restless Bandits

Yunfan Zhao, Tonghan Wang, Dheeraj Nagaraj et al. · harvard, tsinghua

Applying Reinforcement Learning (RL) to Restless Multi-Arm Bandits (RMABs) offers a promising avenue for addressing allocation problems with resource constraints and temporal dynamics. However, classic RMAB models largely overlook the challenges of (systematic) data errors - a common occurrence in real-world scenarios due to factors like varying data collection protocols and intentional noise for differential privacy. We demonstrate that conventional RL algorithms used to train RMABs can struggle to perform well in such settings. To solve this problem, we propose the first communication learning approach in RMABs, where we study which arms, when involved in communication, are most effective in mitigating the influence of such systematic data errors. In our setup, the arms receive Q-function parameters from similar arms as messages to guide behavioral policies, steering Q-function updates. We learn communication strategies by considering the joint utility of messages across all pairs of arms and using a Q-network architecture that decomposes the joint utility. Both theoretical and empirical evidence validate the effectiveness of our method in significantly improving RMAB performance across diverse problems.

LGMay 30, 2022
Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Kai Wang, Lily Xu, Aparna Taneja et al.

Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow for stateful arms, where the state of each arm evolves restlessly with different transitions depending on whether that arm is pulled. Solving RMABs requires information on transition dynamics, which are often unknown upfront. To plan in RMAB settings with unknown transitions, we propose the first online learning algorithm based on the Whittle index policy, using an upper confidence bound (UCB) approach to learn transition dynamics. Specifically, we estimate confidence bounds of the transition probabilities and formulate a bilinear program to compute optimistic Whittle indices using these estimates. Our algorithm, UCWhittle, achieves sublinear $O(H \sqrt{T \log T})$ frequentist regret to solve RMABs with unknown transitions in $T$ episodes with a constant horizon $H$. Empirically, we demonstrate that UCWhittle leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than existing online learning baselines across three domains, including one constructed from a real-world maternal and childcare dataset.

LGNov 13, 2023
Analyzing and Predicting Low-Listenership Trends in a Large-Scale Mobile Health Program: A Preliminary Investigation

Arshika Lalan, Shresth Verma, Kumar Madhu Sudan et al.

Mobile health programs are becoming an increasingly popular medium for dissemination of health information among beneficiaries in less privileged communities. Kilkari is one of the world's largest mobile health programs which delivers time sensitive audio-messages to pregnant women and new mothers. We have been collaborating with ARMMAN, a non-profit in India which operates the Kilkari program, to identify bottlenecks to improve the efficiency of the program. In particular, we provide an initial analysis of the trajectories of beneficiaries' interaction with the mHealth program and examine elements of the program that can be potentially enhanced to boost its success. We cluster the cohort into different buckets based on listenership so as to analyze listenership patterns for each group that could help boost program success. We also demonstrate preliminary results on using historical data in a time-series prediction to identify beneficiary dropouts and enable NGOs in devising timely interventions to strengthen beneficiary retention.

AIApr 28, 2022
ADVISER: AI-Driven Vaccination Intervention Optimiser for Increasing Vaccine Uptake in Nigeria

Vineet Nair, Kritika Prakash, Michael Wilbur et al.

More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in under-developed countries with low vaccination uptake. One of the United Nations' sustainable development goals (SDG 3) aims to end preventable deaths of newborns and children under five years of age. We focus on Nigeria, where the rate of infant mortality is appalling. We collaborate with HelpMum, a large non-profit organization in Nigeria to design and optimize the allocation of heterogeneous health interventions under uncertainty to increase vaccination uptake, the first such collaboration in Nigeria. Our framework, ADVISER: AI-Driven Vaccination Intervention Optimiser, is based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination. Our optimization formulation is intractable in practice. We present a heuristic approach that enables us to solve the problem for real-world use-cases. We also present theoretical bounds for the heuristic method. Finally, we show that the proposed approach outperforms baseline methods in terms of vaccination uptake through experimental evaluation. HelpMum is currently planning a pilot program based on our approach to be deployed in the largest city of Nigeria, which would be the first deployment of an AI-driven vaccination uptake program in the country and hopefully, pave the way for other data-driven programs to improve health outcomes in Nigeria.

AIFeb 6, 2023
Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation

Aditya Mate, Bryan Wilder, Aparna Taneja et al.

We consider the task of evaluating policies of algorithmic resource allocation through randomized controlled trials (RCTs). Such policies are tasked with optimizing the utilization of limited intervention resources, with the goal of maximizing the benefits derived. Evaluation of such allocation policies through RCTs proves difficult, notwithstanding the scale of the trial, because the individuals' outcomes are inextricably interlinked through resource constraints controlling the policy decisions. Our key contribution is to present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We identify conditions under which such reassignments are permissible and can be leveraged to construct counterfactual trials, whose outcomes can be accurately ascertained, for free. We prove theoretically that such an estimator is more accurate than common estimators based on sample means -- we show that it returns an unbiased estimate and simultaneously reduces variance. We demonstrate the value of our approach through empirical experiments on synthetic, semi-synthetic as well as real case study data and show improved estimation accuracy across the board.

AIJan 19, 2023
Decision-Focused Evaluation: Analyzing Performance of Deployed Restless Multi-Arm Bandits

Paritosh Verma, Shresth Verma, Aditya Mate et al.

Restless multi-arm bandits (RMABs) is a popular decision-theoretic framework that has been used to model real-world sequential decision making problems in public health, wildlife conservation, communication systems, and beyond. Deployed RMAB systems typically operate in two stages: the first predicts the unknown parameters defining the RMAB instance, and the second employs an optimization algorithm to solve the constructed RMAB instance. In this work we provide and analyze the results from a first-of-its-kind deployment of an RMAB system in public health domain, aimed at improving maternal and child health. Our analysis is focused towards understanding the relationship between prediction accuracy and overall performance of deployed RMAB systems. This is crucial for determining the value of investing in improving predictive accuracy towards improving the final system performance, and is useful for diagnosing, monitoring deployed RMAB systems. Using real-world data from our deployed RMAB system, we demonstrate that an improvement in overall prediction accuracy may even be accompanied by a degradation in the performance of RMAB system -- a broad investment of resources to improve overall prediction accuracy may not yield expected results. Following this, we develop decision-focused evaluation metrics to evaluate the predictive component and show that it is better at explaining (both empirically and theoretically) the overall performance of a deployed RMAB system.

32.0LGApr 8
Decisions and Deployment: The Five-Year SAHELI Project (2020-2025) on Restless Multi-Armed Bandits for Improving Maternal and Child Health

Shresth Verma, Arpan Dasgupta, Neha Madhiwalla et al.

Maternal and child health is a critical concern around the world. In many global health programs disseminating preventive care and health information, limited healthcare worker resources prevent continuous, personalised engagement with vulnerable beneficiaries. In such scenarios, it becomes crucial to optimally schedule limited live-service resources to maximise long-term engagement. To address this fundamental challenge, the multi-year SAHELI project (2020-2025), in collaboration with partner NGO ARMMAN, leverages AI to allocate scarce resources in a maternal and child health program in India. The SAHELI system solves this sequential resource allocation problem using a Restless Multi-Armed Bandit (RMAB) framework. A key methodological innovation is the transition from a traditional Two-Stage "predict-then-optimize" approach to Decision-Focused Learning (DFL), which directly aligns the framework's learning method with the ultimate goal of maximizing beneficiary engagement. Empirical evaluation through large-scale randomized controlled trials demonstrates that the DFL policy reduced cumulative engagement drops by 31% relative to the current standard of care, significantly outperforming the Two-Stage model. Crucially, the studies also confirmed that this increased program engagement translates directly into statistically significant improvements in real-world health behaviors, notably the continued consumption of vital iron and calcium supplements by new mothers. Ultimately, the SAHELI project provides a scalable blueprint for applying sequential decision-making AI to optimize resource allocation in health programs.

AIMar 25, 2025Code
LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation

Sarah Martinson, Lingkai Kong, Cheol Woo Kim et al.

Agent-based simulation is crucial for modeling complex human behavior, yet traditional approaches require extensive domain knowledge and large datasets. In data-scarce healthcare settings where historic and counterfactual data are limited, large language models (LLMs) offer a promising alternative by leveraging broad world knowledge. This study examines an LLM-driven simulation of a maternal mobile health program, predicting beneficiaries' listening behavior when they receive health information via automated messages (control) or live representatives (intervention). Since uncertainty quantification is critical for decision-making in health interventions, we propose an LLM epistemic uncertainty estimation method based on binary entropy across multiple samples. We enhance model robustness through ensemble approaches, improving F1 score and model calibration compared to individual models. Beyond direct evaluation, we take a decision-focused approach, demonstrating how LLM predictions inform intervention feasibility and trial implementation in data-limited settings. The proposed method extends to public health, disaster response, and other domains requiring rapid intervention assessment under severe data constraints. All code and prompts used for this work can be found at https://github.com/sarahmart/LLM-ABS-ARMMAN-prediction.

LGDec 11, 2024Code
IRL for Restless Multi-Armed Bandits with Applications in Maternal and Child Health

Gauri Jain, Pradeep Varakantham, Haifeng Xu et al.

Public health practitioners often have the goal of monitoring patients and maximizing patients' time spent in "favorable" or healthy states while being constrained to using limited resources. Restless multi-armed bandits (RMAB) are an effective model to solve this problem as they are helpful to allocate limited resources among many agents under resource constraints, where patients behave differently depending on whether they are intervened on or not. However, RMABs assume the reward function is known. This is unrealistic in many public health settings because patients face unique challenges and it is impossible for a human to know who is most deserving of any intervention at such a large scale. To address this shortcoming, this paper is the first to present the use of inverse reinforcement learning (IRL) to learn desired rewards for RMABs, and we demonstrate improved outcomes in a maternal and child health telehealth program. First we allow public health experts to specify their goals at an aggregate or population level and propose an algorithm to design expert trajectories at scale based on those goals. Second, our algorithm WHIRL uses gradient updates to optimize the objective, allowing for efficient and accurate learning of RMAB rewards. Third, we compare with existing baselines and outperform those in terms of run-time and accuracy. Finally, we evaluate and show the usefulness of WHIRL on thousands on beneficiaries from a real-world maternal and child health setting in India. We publicly release our code here: https://github.com/Gjain234/WHIRL.

MAFeb 22, 2024
A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health

Nikhil Behari, Edwin Zhang, Yunfan Zhao et al.

Restless multi-armed bandits (RMAB) have demonstrated success in optimizing resource allocation for large beneficiary populations in public health settings. Unfortunately, RMAB models lack flexibility to adapt to evolving public health policy priorities. Concurrently, Large Language Models (LLMs) have emerged as adept automated planners across domains of robotic control and navigation. In this paper, we propose a Decision Language Model (DLM) for RMABs, enabling dynamic fine-tuning of RMAB policies in public health settings using human-language commands. We propose using LLMs as automated planners to (1) interpret human policy preference prompts, (2) propose reward functions as code for a multi-agent RMAB environment, and (3) iterate on the generated reward functions using feedback from grounded RMAB simulations. We illustrate the application of DLM in collaboration with ARMMAN, an India-based non-profit promoting preventative care for pregnant mothers, that currently relies on RMAB policies to optimally allocate health worker calls to low-resource populations. We conduct a technology demonstration in simulation using the Gemini Pro model, showing DLM can dynamically shape policy outcomes using only human prompts as input.

AIDec 10, 2024
Towards Foundation-model-based Multiagent System to Accelerate AI for Social Impact

Yunfan Zhao, Niclas Boehmer, Aparna Taneja et al.

AI for social impact (AI4SI) offers significant potential for addressing complex societal challenges in areas such as public health, agriculture, education, conservation, and public safety. However, existing AI4SI research is often labor-intensive and resource-demanding, limiting its accessibility and scalability; the standard approach is to design a (base-level) system tailored to a specific AI4SI problem. We propose the development of a novel meta-level multi-agent system designed to accelerate the development of such base-level systems, thereby reducing the computational cost and the burden on social impact domain experts and AI researchers. Leveraging advancements in foundation models and large language models, our proposed approach focuses on resource allocation problems providing help across the full AI4SI pipeline from problem formulation over solution design to impact evaluation. We highlight the ethical considerations and challenges inherent in deploying such systems and emphasize the importance of a human-in-the-loop approach to ensure the responsible and effective application of AI systems.

LGFeb 19, 2024
Evaluating the Effectiveness of Index-Based Treatment Allocation

Niclas Boehmer, Yash Nair, Sanket Shah et al.

When resources are scarce, an allocation policy is needed to decide who receives a resource. This problem occurs, for instance, when allocating scarce medical resources and is often solved using modern ML methods. This paper introduces methods to evaluate index-based allocation policies -- that allocate a fixed number of resources to those who need them the most -- by using data from a randomized control trial. Such policies create dependencies between agents, which render the assumptions behind standard statistical tests invalid and limit the effectiveness of estimators. Addressing these challenges, we translate and extend recent ideas from the statistics literature to present an efficient estimator and methods for computing asymptotically correct confidence intervals. This enables us to effectively draw valid statistical conclusions, a critical gap in previous work. Our extensive experiments validate our methodology in practical settings, while also showcasing its statistical power. We conclude by proposing and empirically verifying extensions of our methodology that enable us to reevaluate a past randomized control trial to evaluate different ML allocation policies in the context of a mHealth program, drawing previously invisible conclusions.

HCMay 23, 2024
Preliminary Study of the Impact of AI-Based Interventions on Health and Behavioral Outcomes in Maternal Health Programs

Arpan Dasgupta, Niclas Boehmer, Neha Madhiwalla et al.

Automated voice calls are an effective method of delivering maternal and child health information to mothers in underserved communities. One method to fight dwindling listenership is through an intervention in which health workers make live service calls. Previous work has shown that we can use AI to identify beneficiaries whose listenership gets the greatest boost from an intervention. It has also been demonstrated that listening to the automated voice calls consistently leads to improved health outcomes for the beneficiaries of the program. These two observations combined suggest the positive effect of AI-based intervention scheduling on behavioral and health outcomes. This study analyzes the relationship between the two. Specifically, we are interested in mothers' health knowledge in the post-natal period, measured through survey questions. We present evidence that improved listenership through AI-scheduled interventions leads to a better understanding of key health issues during pregnancy and infancy. This improved understanding has the potential to benefit the health outcomes of mothers and their babies.

CYMay 14, 2024
Improving Health Information Access in the World's Largest Maternal Mobile Health Program via Bandit Algorithms

Arshika Lalan, Shresth Verma, Paula Rodriguez Diaz et al.

Harnessing the wide-spread availability of cell phones, many nonprofits have launched mobile health (mHealth) programs to deliver information via voice or text to beneficiaries in underserved communities, with maternal and infant health being a key area of such mHealth programs. Unfortunately, dwindling listenership is a major challenge, requiring targeted interventions using limited resources. This paper focuses on Kilkari, the world's largest mHealth program for maternal and child care - with over 3 million active subscribers at a time - launched by India's Ministry of Health and Family Welfare (MoHFW) and run by the non-profit ARRMAN. We present a system called CHAHAK that aims to reduce automated dropouts as well as boost engagement with the program through the strategic allocation of interventions to beneficiaries. Past work in a similar domain has focused on a much smaller scale mHealth program and used markovian restless multiarmed bandits to optimize a single limited intervention resource. However this paper demonstrates the challenges in adopting a markovian approach in Kilkari; therefore CHAHAK instead relies on non-markovian time-series restless bandits, and optimizes multiple interventions to improve listenership. We use real Kilkari data from the Odisha state in India to show CHAHAK's effectiveness in harnessing multiple interventions to boost listenership, benefiting marginalized communities. When deployed CHAHAK will assist the largest maternal mHealth program to date.

LGFeb 7, 2024
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits

Biyonka Liang, Lily Xu, Aparna Taneja et al.

Public health programs often provide interventions to encourage program adherence, and effectively allocating interventions is vital for producing the greatest overall health outcomes, especially in underserved communities where resources are limited. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requiring online reinforcement learning (RL). We present Bayesian Learning for Contextual RMABs (BCoR), an online RL approach for RMABs that novelly combines techniques in Bayesian modeling with Thompson sampling to flexibly model the complex RMAB settings present in public health program adherence problems, namely context and non-stationarity. BCoR's key strength is the ability to leverage shared information within and between arms to learn the unknown RMAB transition dynamics quickly in intervention-scarce settings with relatively short time horizons, which is common in public health applications. Empirically, BCoR achieves substantially higher finite-sample performance over a range of experimental settings, including a setting using real-world adherence data that was developed in collaboration with ARMMAN, an NGO in India which runs a large-scale maternal mHealth program, showcasing BCoR practical utility and potential for real-world deployment.

CLJan 20, 2025
Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness

Ambreesh Parthasarathy, Chandrasekar Subramanian, Ganesh Senrayan et al.

Restless Multi-Armed Bandits (RMABs) have been successfully applied to resource allocation problems in a variety of settings, including public health. With the rapid development of powerful large language models (LLMs), they are increasingly used to design reward functions to better match human preferences. Recent work has shown that LLMs can be used to tailor automated allocation decisions to community needs using language prompts. However, this has been studied primarily for English prompts and with a focus on task performance only. This can be an issue since grassroots workers, especially in developing countries like India, prefer to work in local languages, some of which are low-resource. Further, given the nature of the problem, biases along population groups unintended by the user are also undesirable. In this work, we study the effects on both task performance and fairness when the DLM algorithm, a recent work on using LLMs to design reward functions for RMABs, is prompted with non-English language commands. Specifically, we run the model on a synthetic environment for various prompts translated into multiple languages. The prompts themselves vary in complexity. Our results show that the LLM-proposed reward functions are significantly better when prompted in English compared to other languages. We also find that the exact phrasing of the prompt impacts task performance. Further, as prompt complexity increases, performance worsens for all languages; however, it is more robust with English prompts than with lower-resource languages. On the fairness side, we find that low-resource languages and more complex prompts are both highly likely to create unfairness along unintended dimensions.

AIJul 28, 2025
Beyond Listenership: AI-Predicted Interventions Drive Improvements in Maternal Health Behaviours

Arpan Dasgupta, Sarvesh Gharat, Neha Madhiwalla et al.

Automated voice calls with health information are a proven method for disseminating maternal and child health information among beneficiaries and are deployed in several programs around the world. However, these programs often suffer from beneficiary dropoffs and poor engagement. In previous work, through real-world trials, we showed that an AI model, specifically a restless bandit model, could identify beneficiaries who would benefit most from live service call interventions, preventing dropoffs and boosting engagement. However, one key question has remained open so far: does such improved listenership via AI-targeted interventions translate into beneficiaries' improved knowledge and health behaviors? We present a first study that shows not only listenership improvements due to AI interventions, but also simultaneously links these improvements to health behavior changes. Specifically, we demonstrate that AI-scheduled interventions, which enhance listenership, lead to statistically significant improvements in beneficiaries' health behaviors such as taking iron or calcium supplements in the postnatal period, as well as understanding of critical health topics during pregnancy and infancy. This underscores the potential of AI to drive meaningful improvements in maternal and child health.

AIJul 22, 2025
Learning to Call: A Field Trial of a Collaborative Bandit Algorithm for Improved Message Delivery in Mobile Maternal Health

Arpan Dasgupta, Mizhaan Maniyar, Awadhesh Srivastava et al.

Mobile health (mHealth) programs utilize automated voice messages to deliver health information, particularly targeting underserved communities, demonstrating the effectiveness of using mobile technology to disseminate crucial health information to these populations, improving health outcomes through increased awareness and behavioral change. India's Kilkari program delivers vital maternal health information via weekly voice calls to millions of mothers. However, the current random call scheduling often results in missed calls and reduced message delivery. This study presents a field trial of a collaborative bandit algorithm designed to optimize call timing by learning individual mothers' preferred call times. We deployed the algorithm with around $6500$ Kilkari participants as a pilot study, comparing its performance to the baseline random calling approach. Our results demonstrate a statistically significant improvement in call pick-up rates with the bandit algorithm, indicating its potential to enhance message delivery and impact millions of mothers across India. This research highlights the efficacy of personalized scheduling in mobile health interventions and underscores the potential of machine learning to improve maternal health outreach at scale.

LGOct 28, 2024
Bayesian Collaborative Bandits with Thompson Sampling for Improved Outreach in Maternal Health Program

Arpan Dasgupta, Gagan Jain, Arun Suggala et al.

Mobile health (mHealth) programs face a critical challenge in optimizing the timing of automated health information calls to beneficiaries. This challenge has been formulated as a collaborative multi-armed bandit problem, requiring online learning of a low-rank reward matrix. Existing solutions often rely on heuristic combinations of offline matrix completion and exploration strategies. In this work, we propose a principled Bayesian approach using Thompson Sampling for this collaborative bandit problem. Our method leverages prior information through efficient Gibbs sampling for posterior inference over the low-rank matrix factors, enabling faster convergence. We demonstrate significant improvements over state-of-the-art baselines on a real-world dataset from the world's largest maternal mHealth program. Our approach achieves a $16\%$ reduction in the number of calls compared to existing methods and a $47$\% reduction compared to the deployed random policy. This efficiency gain translates to a potential increase in program capacity by $0.5-1.4$ million beneficiaries, granting them access to vital ante-natal and post-natal care information. Furthermore, we observe a $7\%$ and $29\%$ improvement in beneficiary retention (an extremely hard metric to impact) compared to state-of-the-art and deployed baselines, respectively. Synthetic simulations further demonstrate the superiority of our approach, particularly in low-data regimes and in effectively utilizing prior information. We also provide a theoretical analysis of our algorithm in a special setting using Eluder dimension.

AIMar 8, 2024
Efficient Public Health Intervention Planning Using Decomposition-Based Decision-Focused Learning

Sanket Shah, Arun Suggala, Milind Tambe et al.

The declining participation of beneficiaries over time is a key concern in public health programs. A popular strategy for improving retention is to have health workers `intervene' on beneficiaries at risk of dropping out. However, the availability and time of these health workers are limited resources. As a result, there has been a line of research on optimizing these limited intervention resources using Restless Multi-Armed Bandits (RMABs). The key technical barrier to using this framework in practice lies in the need to estimate the beneficiaries' RMAB parameters from historical data. Recent research has shown that Decision-Focused Learning (DFL), which focuses on maximizing the beneficiaries' adherence rather than predictive accuracy, improves the performance of intervention targeting using RMABs. Unfortunately, these gains come at a high computational cost because of the need to solve and evaluate the RMAB in each DFL training step. In this paper, we provide a principled way to exploit the structure of RMABs to speed up intervention planning by cleverly decoupling the planning for different beneficiaries. We use real-world data from an Indian NGO, ARMMAN, to show that our approach is up to two orders of magnitude faster than the state-of-the-art approach while also yielding superior model performance. This would enable the NGO to scale up deployments using DFL to potentially millions of mothers, ultimately advancing progress toward UNSDG 3.1.

AIMay 22, 2023
Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare

Panayiotis Danassis, Shresth Verma, Jackson A. Killian et al.

The success of many healthcare programs depends on participants' adherence. We consider the problem of scheduling interventions in low resource settings (e.g., placing timely support calls from health workers) to increase adherence and/or engagement. Past works have successfully developed several classes of Restless Multi-armed Bandit (RMAB) based solutions for this problem. Nevertheless, all past RMAB approaches assume that the participants' behaviour follows the Markov property. We demonstrate significant deviations from the Markov assumption on real-world data on a maternal health awareness program from our partner NGO, ARMMAN. Moreover, we extend RMABs to continuous state spaces, a previously understudied area. To tackle the generalised non-Markovian RMAB setting we (i) model each participant's trajectory as a time-series, (ii) leverage the power of time-series forecasting models to learn complex patterns and dynamics to predict future states, and (iii) propose the Time-series Arm Ranking Index (TARI) policy, a novel algorithm that selects the RMAB arms that will benefit the most from an intervention, given our future state predictions. We evaluate our approach on both synthetic data, and a secondary analysis on real data from ARMMAN, and demonstrate significant increase in engagement compared to the SOTA, deployed Whittle index solution. This translates to 16.3 hours of additional content listened, 90.8% more engagement drops prevented, and reaching more than twice as many high dropout-risk beneficiaries.

LGFeb 2, 2022
Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health

Kai Wang, Shresth Verma, Aditya Mate et al.

This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions. However, prior works often learn the model by maximizing the predictive accuracy instead of final RMAB solution quality, causing a mismatch between training and evaluation objectives. To address this shortcoming, we propose a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality. We present three key contributions: (i) we establish differentiability of the Whittle index policy to support decision-focused learning; (ii) we significantly improve the scalability of decision-focused learning approaches in sequential problems, specifically RMAB problems; (iii) we apply our algorithm to a previously collected dataset of maternal and child health to demonstrate its performance. Indeed, our algorithm is the first for decision-focused learning in RMAB that scales to real-world problem sizes.

AISep 22, 2021
Facilitating human-wildlife cohabitation through conflict prediction

Susobhan Ghosh, Pradeep Varakantham, Aniket Bhatkhande et al.

With increasing world population and expanded use of forests as cohabited regions, interactions and conflicts with wildlife are increasing, leading to large-scale loss of lives (animal and human) and livelihoods (economic). While community knowledge is valuable, forest officials and conservation organisations can greatly benefit from predictive analysis of human-wildlife conflict, leading to targeted interventions that can potentially help save lives and livelihoods. However, the problem of prediction is a complex socio-technical problem in the context of limited data in low-resource regions. Identifying the "right" features to make accurate predictions of conflicts at the required spatial granularity using a sparse conflict training dataset} is the key challenge that we address in this paper. Specifically, we do an illustrative case study on human-wildlife conflicts in the Bramhapuri Forest Division in Chandrapur, Maharashtra, India. Most existing work has considered human-wildlife conflicts in protected areas and to the best of our knowledge, this is the first effort at prediction of human-wildlife conflicts in unprotected areas and using those predictions for deploying interventions on the ground.

LGSep 16, 2021
Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health

Aditya Mate, Lovish Madaan, Aparna Taneja et al.

The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing ~ 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our RMAB system to the NGO for real-world use.