Sami Jullien

h-index5

4papers

92citations

Novelty43%

AI Score29

Ranked #146,008 of 194,257 authors (top 75%)#32,145 in LG (top 80%)

4 Papers

5.8LGMay 30, 2022

A Simulation Environment and Reinforcement Learning Method for Waste Reduction

Sami Jullien, Mozhdeh Ariannezhad, Paul Groth et al.

In retail (e.g., grocery stores, apparel shops, online retailers), inventory managers have to balance short-term risk (no items to sell) with long-term-risk (over ordering leading to product waste). This balancing task is made especially hard due to the lack of information about future customer purchases. In this paper, we study the problem of restocking a grocery store's inventory with perishable items over time, from a distributional point of view. The objective is to maximize sales while minimizing waste, with uncertainty about the actual consumption by costumers. This problem is of a high relevance today, given the growing demand for food and the impact of food waste on the environment, the economy, and purchasing power. We frame inventory restocking as a new reinforcement learning task that exhibits stochastic behavior conditioned on the agent's actions, making the environment partially observable. We make two main contributions. First, we introduce a new reinforcement learning environment, RetaiL, based on real grocery store data and expert knowledge. This environment is highly stochastic, and presents a unique challenge for reinforcement learning practitioners. We show that uncertainty about the future behavior of the environment is not handled well by classical supply chain algorithms, and that distributional approaches are a good way to account for the uncertainty. Second, we introduce GTDQN, a distributional reinforcement learning algorithm that learns a generalized Tukey Lambda distribution over the reward space. GTDQN provides a strong baseline for our environment. It outperforms other distributional reinforcement learning approaches in this partially observable setting, in both overall reward and reduction of generated waste.

11.1AINov 1, 2021Code

Reproducibility as a Mechanism for Teaching Fairness, Accountability, Confidentiality, and Transparency in Artificial Intelligence

Ana Lucic, Maurits Bleeker, Sami Jullien et al.

In this work, we explain the setup for a technical, graduate-level course on Fairness, Accountability, Confidentiality, and Transparency in Artificial Intelligence (FACT-AI) at the University of Amsterdam, which teaches FACT-AI concepts through the lens of reproducibility. The focal point of the course is a group project based on reproducing existing FACT-AI algorithms from top AI conferences and writing a corresponding report. In the first iteration of the course, we created an open source repository with the code implementations from the group projects. In the second iteration, we encouraged students to submit their group projects to the Machine Learning Reproducibility Challenge, resulting in 9 reports from our course being accepted for publication in the ReScience journal. We reflect on our experience teaching the course over two years, where one year coincided with a global pandemic, and propose guidelines for teaching FACT-AI through reproducibility in graduate-level AI study programs. We hope this can be a useful resource for instructors who want to set up similar courses in the future.

8.8LGMay 26, 2023

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression

Sami Jullien, Romain Deffayet, Jean-Michel Renders et al.

Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and extracts rich feedback from environment samples. The commonly used quantile regression approach to distributional RL -- based on asymmetric $L_1$ losses -- provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, asymmetric hybrid $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our proposed operator converges to the distributional Bellman operator in the limit of infinite estimated quantile and expectile fractions, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.

15.6IRSep 29, 2021Code

A Next Basket Recommendation Reality Check

Ming Li, Sami Jullien, Mozhdeh Ariannezhad et al.

The goal of a next basket recommendation (NBR) system is to recommend items for the next basket for a user, based on the sequence of their prior baskets. Recently, a number of methods with complex modules have been proposed that claim state-of-the-art performance. They rarely look into the predicted basket and just provide intuitive reasons for the observed improvements, e.g., better representation, capturing intentions or relations, etc. We provide a novel angle on the evaluation of next basket recommendation methods, centered on the distinction between repetition and exploration: the next basket is typically composed of previously consumed items (i.e., repeat items) and new items (i.e, explore items). We propose a set of metrics that measure the repeat/explore ratio and performance of NBR models. Using these new metrics, we analyze state-of-the-art NBR models. The results of our analysis help to clarify the extent of the actual progress achieved by existing NBR methods as well as the underlying reasons for the improvements. Overall, our work sheds light on the evaluation problem of NBR and provides useful insights into the model design for this task.