Adaptive Model Selection Framework: An Application to Airline Pricing
This work addresses the exploration-exploitation dilemma in online airline pricing for algorithm designers, though it is incremental as it applies an existing bandit method to a specific domain.
The paper tackles the problem of selecting among multiple airline pricing models for customer requests by introducing an adaptive meta-decision framework using Thompson sampling, which improved expected revenue per offer by 43% and conversion score by 58% compared to random selection in offline simulations.
Multiple machine learning and prediction models are often used for the same prediction or recommendation task. In our recent work, where we develop and deploy airline ancillary pricing models in an online setting, we found that among multiple pricing models developed, no one model clearly dominates other models for all incoming customer requests. Thus, as algorithm designers, we face an exploration - exploitation dilemma. In this work, we introduce an adaptive meta-decision framework that uses Thompson sampling, a popular multi-armed bandit solution method, to route customer requests to various pricing models based on their online performance. We show that this adaptive approach outperform a uniformly random selection policy by improving the expected revenue per offer by 43% and conversion score by 58% in an offline simulation.