IRAug 8, 2023
AdaptEx: A Self-Service Contextual Bandit PlatformWilliam Black, Ercument Ilhan, Andrea Marchini et al.
This paper presents AdaptEx, a self-service contextual bandit platform widely used at Expedia Group, that leverages multi-armed bandit algorithms to personalize user experiences at scale. AdaptEx considers the unique context of each visitor to select the optimal variants and learns quickly from every interaction they make. It offers a powerful solution to improve user experiences while minimizing the costs and time associated with traditional testing methods. The platform unlocks the ability to iterate towards optimal product solutions quickly, even in ever-changing content and continuous "cold start" situations gracefully.
IRSep 2, 2024
TRACE: Transformer-based user Representations from Attributed Clickstream Event sequencesWilliam Black, Alexander Manlove, Jack Pennington et al.
For users navigating travel e-commerce websites, the process of researching products and making a purchase often results in intricate browsing patterns that span numerous sessions over an extended period of time. The resulting clickstream data chronicle these user journeys and present valuable opportunities to derive insights that can significantly enhance personalized recommendations. We introduce TRACE, a novel transformer-based approach tailored to generate rich user embeddings from live multi-session clickstreams for real-time recommendation applications. Prior works largely focus on single-session product sequences, whereas TRACE leverages site-wide page view sequences spanning multiple user sessions to model long-term engagement. Employing a multi-task learning framework, TRACE captures comprehensive user preferences and intents distilled into low-dimensional representations. We demonstrate TRACE's superior performance over vanilla transformer and LLM-style architectures through extensive experiments on a large-scale travel e-commerce dataset of real user journeys, where the challenges of long page-histories and sparse targets are particularly prevalent. Visualizations of the learned embeddings reveal meaningful clusters corresponding to latent user states and behaviors, highlighting TRACE's potential to enhance recommendation systems by capturing nuanced user interactions and preferences
LGApr 17, 2021
Action Advising with Advice Imitation in Deep Reinforcement LearningErcument Ilhan, Jeremy Gow, Diego Perez-Liebana
Action advising is a peer-to-peer knowledge exchange technique built on the teacher-student paradigm to alleviate the sample inefficiency problem in deep reinforcement learning. Recently proposed student-initiated approaches have obtained promising results. However, due to being in the early stages of development, these also have some substantial shortcomings. One of the abilities that are absent in the current methods is further utilising advice by reusing, which is especially crucial in the practical settings considering the budget and cost constraints in peer-to-peer. In this study, we present an approach to enable the student agent to imitate previously acquired advice to reuse them directly in its exploration policy, without any interventions in the learning mechanism itself. In particular, we employ a behavioural cloning module to imitate the teacher policy and use dropout regularisation to have a notion of epistemic uncertainty to keep track of which state-advice pairs are actually collected. As the results of experiments we conducted in three Atari games show, advice reusing via generalisation is indeed a feasible option in deep RL and our approach can successfully achieve this while significantly improving the learning performance, even when paired with a simple early advising heuristic.
LGApr 17, 2021
Learning on a Budget via Teacher ImitationErcument Ilhan, Jeremy Gow, Diego Perez-Liebana
Deep Reinforcement Learning (RL) techniques can benefit greatly from leveraging prior experience, which can be either self-generated or acquired from other entities. Action advising is a framework that provides a flexible way to transfer such knowledge in the form of actions between teacher-student peers. However, due to the realistic concerns, the number of these interactions is limited with a budget; therefore, it is crucial to perform these in the most appropriate moments. There have been several promising studies recently that address this problem setting especially from the student's perspective. Despite their success, they have some shortcomings when it comes to the practical applicability and integrity as an overall solution to the learning from advice challenge. In this paper, we extend the idea of advice reusing via teacher imitation to construct a unified approach that addresses both advice collection and advice utilisation problems. We also propose a method to automatically tune the relevant hyperparameters of these components on-the-fly to make it able to adapt to any task with minimal human intervention. The experiments we performed in 5 different Atari games verify that our algorithm either surpasses or performs on-par with its top competitors while being far simpler to be employed. Furthermore, its individual components are also found to be providing significant advantages alone.
LGOct 1, 2020
Student-Initiated Action Advising via Advice NoveltyErcument Ilhan, Jeremy Gow, Diego Perez-Liebana
Action advising is a budget-constrained knowledge exchange mechanism between teacher-student peers that can help tackle exploration and sample inefficiency problems in deep reinforcement learning (RL). Most recently, student-initiated techniques that utilise state novelty and uncertainty estimations have obtained promising results. However, the approaches built on these estimations have some potential weaknesses. First, they assume that the convergence of the student's RL model implies less need for advice. This can be misleading in scenarios with teacher absence early on where the student is likely to learn suboptimally by itself; yet also ignore the teacher's assistance later. Secondly, the delays between encountering states and having them to take effect in the RL model updates in presence of the experience replay dynamics cause a feedback lag in what the student actually needs advice for. We propose a student-initiated algorithm that alleviates these by employing Random Network Distillation (RND) to measure the novelty of a piece of advice. Furthermore, we perform RND updates only for the advised states to ensure that the student's own learning does not impair its ability to leverage the teacher. Experiments in GridWorld and MinAtar show that our approach performs on par with the state-of-the-art and demonstrates significant advantages in the scenarios where the existing methods are prone to fail.