Flavian Vasile

IR
h-index13
34papers
1,169citations
Novelty50%
AI Score54

34 Papers

56.6LGMay 27
Off-Policy Learning to Reason Works Because It Is More Pessimistic Than You Think

Otmane Sakhi, Aleksei Arzhantsev, Imad Aouali et al.

Large scale reinforcement learning has become a central tool for improving reasoning in large language models. At this scale, generation is often lagged or asynchronous, so updates are performed on data collected by older policies. This makes learning inherently off-policy. Most existing approaches nevertheless remain rooted in PPO-style trust-region objectives, treating training as approximately on-policy and using importance weights to correct distribution mismatch. These corrections can introduce high variance, destabilize optimization, and accelerate entropy collapse. Recent work suggests an alternative: rather than correcting the mismatch, one can embrace off-policy data and remove importance weights, often yielding stronger algorithms. In this paper, we provide an intuitive construction of off-policy objectives that include successful off-policy objectives and show that their effectiveness can be understood through implicit pessimism: they optimize toward target policies that are more conservative than their nominal objectives suggest. This perspective explains why some particular implementation choices improve stability: they implicitly control the effective target distribution. We then propose a principled modification that stabilize this induced distribution and improve off-policy learning.

IRSep 8, 2023
AdBooster: Personalized Ad Creative Generation using Stable Diffusion Outpainting

Veronika Shilova, Ludovic Dos Santos, Flavian Vasile et al.

In digital advertising, the selection of the optimal item (recommendation) and its best creative presentation (creative optimization) have traditionally been considered separate disciplines. However, both contribute significantly to user satisfaction, underpinning our assumption that it relies on both an item's relevance and its presentation, particularly in the case of visual creatives. In response, we introduce the task of {\itshape Generative Creative Optimization (GCO)}, which proposes the use of generative models for creative generation that incorporate user interests, and {\itshape AdBooster}, a model for personalized ad creatives based on the Stable Diffusion outpainting architecture. This model uniquely incorporates user interests both during fine-tuning and at generation time. To further improve AdBooster's performance, we also introduce an automated data augmentation pipeline. Through our experiments on simulated data, we validate AdBooster's effectiveness in generating more relevant creatives than default product images, showing its potential of enhancing user engagement.

IRAug 10, 2022
Probabilistic Rank and Reward: A Scalable Model for Slate Recommendation

Imad Aouali, Achraf Ait Sidi Hammou, Otmane Sakhi et al.

We introduce Probabilistic Rank and Reward (PRR), a scalable probabilistic model for personalized slate recommendation. Our approach allows off-policy estimation of the reward in the scenario where the user interacts with at most one item from a slate of K items. We show that the probability of a slate being successful can be learned efficiently by combining the reward, whether the user successfully interacted with the slate, and the rank, the item that was selected within the slate. PRR outperforms existing off-policy reward optimizing methods and is far more scalable to large action spaces. Moreover, PRR allows fast delivery of recommendations powered by maximum inner product search (MIPS), making it suitable in low latency domains such as computational advertising.

IRSep 18, 2022
Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation

Imad Aouali, Amine Benhalloum, Martin Bompaire et al.

Both in academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems. Naturally, the reason for this is that we can directly measure utility metrics that rely on interventions, being the recommendations that are being shown to users. Nevertheless, online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures. In industry, offline metrics are often used as a first-line evaluation to generate promising candidate models to evaluate online. In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods. Two classes of offline metrics exist: proxy-based methods, and counterfactual methods. The first class is often poorly correlated with the online metrics we care about, and the latter class only provides theoretical guarantees under assumptions that cannot be fulfilled in real-world environments. Here, we make the case that simulation-based comparisons provide ways forward beyond offline metrics, and argue that they are a preferable means of evaluation.

28.8CVMay 19
SphericalDreamer: Generating Navigable Immersive 3D Worlds with Panorama Fusion

Antoine Schnepf, Karim Kassab, Flavian Vasile et al.

The generation of immersive and navigable 3D environments is increasingly prevalent with the growing adoption of virtual reality and 3D content. However, recent methods face a fundamental limitation: they cannot produce 3D worlds that simultaneously (i) are navigable over long-range spatial extents and (ii) cover the complete omnidirectional field of view ($360^\circ$ horizontally and $180^\circ$ vertically). To address this challenge, we introduce SphericalDreamer, a method for generating fully immersive and long-range 3D outdoor environments from textual prompts. Our approach is built on the generation of multiple panoramic images, which are subsequently lifted into 3D and fused together while maintaining visual and geometric consistency. SphericalDreamer produces highly detailed, fully immersive 3D environments, while substantially improving scale and navigability compared to prior approaches.

CVOct 30, 2024Code
Bringing NeRFs to the Latent Space: Inverse Graphics Autoencoder

Antoine Schnepf, Karim Kassab, Jean-Yves Franceschi et al.

While pre-trained image autoencoders are increasingly utilized in computer vision, the application of inverse graphics in 2D latent spaces has been under-explored. Yet, besides reducing the training and rendering complexity, applying inverse graphics in the latent space enables a valuable interoperability with other latent-based 2D methods. The major challenge is that inverse graphics cannot be directly applied to such image latent spaces because they lack an underlying 3D geometry. In this paper, we propose an Inverse Graphics Autoencoder (IG-AE) that specifically addresses this issue. To this end, we regularize an image autoencoder with 3D-geometry by aligning its latent space with jointly trained latent 3D scenes. We utilize the trained IG-AE to bring NeRFs to the latent space with a latent NeRF training pipeline, which we implement in an open-source extension of the Nerfstudio framework, thereby unlocking latent scene learning for its supported methods. We experimentally confirm that Latent NeRFs trained with IG-AE present an improved quality compared to a standard autoencoder, all while exhibiting training and rendering accelerations with respect to NeRFs trained in the image space. Our project page can be found at https://ig-ae.github.io .

71.4IRMay 11
RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents

Imad Aouali, Flavian Vasile, Otmane Sakhi et al.

LLM recommendation agents increasingly produce structured recommendation reports: sets of items accompanied by natural-language justifications. Yet existing evaluations often reduce this setting to reranking small shortlisted candidate sets or judge reports mainly by semantic plausibility. We introduce Recommendation Atlas (Agentic Tool-Level Assessment for Shopping), or RecoAtlas, a benchmark and toolkit for evaluating shopping agents with behavior-grounded metrics. RecoAtlas complements held-out interaction metrics with learned utility proxies for relevance, complementarity, and diversity derived from interaction data, while separately measuring semantic coherence and explanation quality. Its controlled tool environment exposes agents to either semantic, behavior-aligned, or faulty tools, enabling diagnosis of whether performance gains arise from stronger reasoning, better signals, or more effective tool-use policies. Across controlled experiments, we show that RecoAtlas exhibits key properties of a meaningful benchmark for agentic systems: performance scales with model capacity and test-time compute, improves with stronger and better-aligned tools, degrades under noisy or misaligned signals, and reveals that semantic plausibility does not necessarily capture behavior-grounded utility. RecoAtlas provides a foundation for developing and evaluating shopping assistants that optimize not only for plausible recommendations, but also for coherent, behaviorally grounded recommendation sets.

CVOct 31, 2024Code
Fused-Planes: Improving Planar Representations for Learning Large Sets of 3D Scenes

Karim Kassab, Antoine Schnepf, Jean-Yves Franceschi et al.

To learn large sets of scenes, Tri-Planes are commonly employed for their planar structure that enables an interoperability with image models, and thus diverse 3D applications. However, this advantage comes at the cost of resource efficiency, as Tri-Planes are not the most computationally efficient option. In this paper, we introduce Fused-Planes, a new planar architecture that improves Tri-Planes resource-efficiency in the framework of learning large sets of scenes, which we call "multi-scene inverse graphics". To learn a large set of scenes, our method divides it into two subsets and operates as follows: (i) we train the first subset of scenes jointly with a compression model, (ii) we use that compression model to learn the remaining scenes. This compression model consists of a 3D-aware latent space in which Fused-Planes are learned, enabling a reduced rendering resolution, and shared structures across scenes that reduce scene representation complexity. Fused-Planes present competitive resource costs in multi-scene inverse graphics, while preserving Tri-Planes rendering quality, and maintaining their widely favored planar structure. Our codebase is publicly available as open-source. Our project page can be found at https://fused-planes.github.io .

LGOct 3, 2025
RoiRL: Efficient, Self-Supervised Reasoning with Offline Iterative Reinforcement Learning

Aleksei Arzhantsev, Otmane Sakhi, Flavian Vasile

Reinforcement learning (RL) is central to improving reasoning in large language models (LLMs) but typically requires ground-truth rewards. Test-Time Reinforcement Learning (TTRL) removes this need by using majority-vote rewards, but relies on heavy online RL and incurs substantial computational cost. We propose RoiRL: Reasoning with offline iterative Reinforcement Learning, a family of lightweight offline learning alternatives that can target the same regularized optimal policies. Unlike TTRL, RoiRL eliminates the need to maintain a reference model and instead optimizes weighted log-likelihood objectives, enabling stable training with significantly lower memory and compute requirements. Experimental results show that RoiRL trains to 2.5x faster and consistently outperforms TTRL on reasoning benchmarks, establishing a scalable path to self-improving LLMs without labels.

CVMar 18, 2024
Exploring 3D-aware Latent Spaces for Efficiently Learning Numerous Scenes

Antoine Schnepf, Karim Kassab, Jean-Yves Franceschi et al.

We present a method enabling the scaling of NeRFs to learn a large number of semantically-similar scenes. We combine two techniques to improve the required training time and memory cost per scene. First, we learn a 3D-aware latent space in which we train Tri-Plane scene representations, hence reducing the resolution at which scenes are learned. Moreover, we present a way to share common information across scenes, hence allowing for a reduction of model complexity to learn a particular scene. Our method reduces effective per-scene memory costs by 44% and per-scene time costs by 86% when training 1000 scenes. Our project page can be found at https://3da-ae.github.io .

CVDec 13, 2023
3DGEN: A GAN-based approach for generating novel 3D models from image data

Antoine Schnepf, Flavian Vasile, Ugo Tanielian

The recent advances in text and image synthesis show a great promise for the future of generative models in creative fields. However, a less explored area is the one of 3D model generation, with a lot of potential applications to game design, video production, and physical product design. In our paper, we present 3DGEN, a model that leverages the recent work on both Neural Radiance Fields for object reconstruction and GAN-based image generation. We show that the proposed architecture can generate plausible meshes for objects of the same category as the training images and compare the resulting meshes with the state-of-the-art baselines, leading to visible uplifts in generation quality.

LGSep 2, 2021
What Users Want? WARHOL: A Generative Model for Recommendation

Jules Samaran, Ugo Tanielian, Romain Beaumont et al.

Current recommendation approaches help online merchants predict, for each visiting user, which subset of their existing products is the most relevant. However, besides being interested in matching users with existing products, merchants are also interested in understanding their users' underlying preferences. This could indeed help them produce or acquire better matching products in the future. We argue that existing recommendation models cannot directly be used to predict the optimal combination of features that will make new products serve better the needs of the target audience. To tackle this, we turn to generative models, which allow us to learn explicitly distributions over product feature combinations both in text and visual space. We develop WARHOL, a product generation and recommendation architecture that takes as input past user shopping activity and generates relevant textual and visual descriptions of novel products. We show that WARHOL can approach the performance of state-of-the-art recommendation models, while being able to generate entirely new products that are relevant to the given user profiles.

LGJul 26, 2021
Combining Reward and Rank Signals for Slate Recommendation

Imad Aouali, Sergey Ivanov, Mike Gartrell et al.

We consider the problem of slate recommendation, where the recommender system presents a user with a collection or slate composed of K recommended items at once. If the user finds the recommended items appealing then the user may click and the recommender system receives some feedback. Two pieces of information are available to the recommender system: was the slate clicked? (the reward), and if the slate was clicked, which item was clicked? (rank). In this paper, we formulate several Bayesian models that incorporate the reward signal (Reward model), the rank signal (Rank model), or both (Full model), for non-personalized slate recommendation. In our experiments, we analyze performance gains of the Full model and show that it achieves significantly lower error as the number of products in the catalog grows or as the slate size increases.

MLNov 13, 2020
Improving Offline Contextual Bandits with Distributional Robustness

Otmane Sakhi, Louis Faury, Flavian Vasile

This paper extends the Distributionally Robust Optimization (DRO) approach for offline contextual bandits. Specifically, we leverage this framework to introduce a convex reformulation of the Counterfactual Risk Minimization principle. Besides relying on convex programs, our approach is compatible with stochastic optimization, and can therefore be readily adapted tothe large data regime. Our approach relies on the construction of asymptotic confidence intervals for offline contextual bandits through the DRO framework. By leveraging known asymptotic results of robust estimators, we also show how to automatically calibrate such confidence intervals, which in turn removes the burden of hyper-parameter selection for policy optimization. We present preliminary empirical results supporting the effectiveness of our approach.

IRSep 1, 2020
From Clicks to Conversions: Recommendation for long-term reward

Philomène Chagniot, Flavian Vasile, David Rohde

Recommender systems are often optimised for short-term reward: a recommendation is considered successful if a reward (e.g. a click) can be observed immediately after the recommendation. The advantage of this framework is that with some reasonable (although questionable) assumptions, it allows familiar supervised learning tools to be used for the recommendation task. However, it means that long-term business metrics, e.g. sales or retention are ignored. In this paper we introduce a framework for modeling long-term rewards in the RecoGym simulation environment. We use this newly introduced functionality to showcase problems introduced by the last-click attribution scheme in the case of conversion-optimized recommendations and propose a simple extension that leads to state-of-the-art results.

MLAug 28, 2020
BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals

Otmane Sakhi, Stephen Bonner, David Rohde et al.

A common task for recommender systems is to build a pro le of the interests of a user from items in their browsing history and later to recommend items to the user from the same catalog. The users' behavior consists of two parts: the sequence of items that they viewed without intervention (the organic part) and the sequences of items recommended to them and their outcome (the bandit part). In this paper, we propose Bayesian Latent Organic Bandit model (BLOB), a probabilistic approach to combine the 'or-ganic' and 'bandit' signals in order to improve the estimation of recommendation quality. The bandit signal is valuable as it gives direct feedback of recommendation performance, but the signal quality is very uneven, as it is highly concentrated on the recommendations deemed optimal by the past version of the recom-mender system. In contrast, the organic signal is typically strong and covers most items, but is not always relevant to the recommendation task. In order to leverage the organic signal to e ciently learn the bandit signal in a Bayesian model we identify three fundamental types of distances, namely action-history, action-action and history-history distances. We implement a scalable approximation of the full model using variational auto-encoders and the local re-paramerization trick. We show using extensive simulation studies that our method out-performs or matches the value of both state-of-the-art organic-based recommendation algorithms, and of bandit-based methods (both value and policy-based) both in organic and bandit-rich environments.

MLOct 2, 2019
Reconsidering Analytical Variational Bounds for Output Layers of Deep Networks

Otmane Sakhi, Stephen Bonner, David Rohde et al.

The combination of the re-parameterization trick with the use of variational auto-encoders has caused a sensation in Bayesian deep learning, allowing the training of realistic generative models of images and has considerably increased our ability to use scalable latent variable models. The re-parameterization trick is necessary for models in which no analytical variational bound is available and allows noisy gradients to be computed for arbitrary models. However, for certain standard output layers of a neural network, analytical bounds are available and the variational auto-encoder may be used both without the re-parameterization trick or the need for any Monte Carlo approximation. In this work, we show that using Jaakola and Jordan bound, we can produce a binary classification layer that allows a Bayesian output layer to be trained, using the standard stochastic gradient descent algorithm. We further demonstrate that a latent variable model utilizing the Bouchard bound for multi-class classification allows for fast training of a fully probabilistic latent factor model, even when the number of classes is very large.

IRSep 18, 2019
Learning from Bandit Feedback: An Overview of the State-of-the-art

Olivier Jeunen, Dmytro Mykhaylov, David Rohde et al.

In machine learning we often try to optimise a decision rule that would have worked well over a historical dataset; this is the so called empirical risk minimisation principle. In the context of learning from recommender system logs, applying this principle becomes a problem because we do not have available the reward of decisions we did not do. In order to handle this "bandit-feedback" setting, several Counterfactual Risk Minimisation (CRM) methods have been proposed in recent years, that attempt to estimate the performance of different policies on historical data. Through importance sampling and various variance reduction techniques, these methods allow more robust learning and inference than classical approaches. It is difficult to accurately estimate the performance of policies that frequently perform actions that were infrequently done in the past and a number of different types of estimators have been proposed. In this paper, we review several methods, based on different off-policy estimators, for learning from bandit feedback. We discuss key differences and commonalities among existing approaches, and compare their empirical performance on the RecoGym simulation environment. To the best of our knowledge, this work is the first comparison study for bandit algorithms in a recommender system setting.

MLSep 17, 2019
Relaxed Softmax for learning from Positive and Unlabeled data

Ugo Tanielian, Flavian Vasile

In recent years, the softmax model and its fast approximations have become the de-facto loss functions for deep neural networks when dealing with multi-class prediction. This loss has been extended to language modeling and recommendation, two fields that fall into the framework of learning from Positive and Unlabeled data. In this paper, we stress the different drawbacks of the current family of softmax losses and sampling schemes when applied in a Positive and Unlabeled learning setup. We propose both a Relaxed Softmax loss (RS) and a new negative sampling scheme based on Boltzmann formulation. We show that the new training objective is better suited for the tasks of density estimation, item similarity and next-event prediction by driving uplifts in performance on textual and recommendation datasets against classical softmax.

IRSep 9, 2019
Recommendation System-based Upper Confidence Bound for Online Advertising

Nhan Nguyen-Thanh, Dana Marinca, Kinda Khawam et al.

In this paper, the method UCB-RS, which resorts to recommendation system (RS) for enhancing the upper-confidence bound algorithm UCB, is presented. The proposed method is used for dealing with non-stationary and large-state spaces multi-armed bandit problems. The proposed method has been targeted to the problem of the product recommendation in the online advertising. Through extensive testing with RecoGym, an OpenAI Gym-based reinforcement learning environment for the product recommendation in online advertising, the proposed method outperforms the widespread reinforcement learning schemes such as $ε$-Greedy, Upper Confidence (UCB1) and Exponential Weights for Exploration and Exploitation (EXP3).

IRJul 26, 2019
On the Value of Bandit Feedback for Offline Recommender System Evaluation

Olivier Jeunen, David Rohde, Flavian Vasile

In academic literature, recommender systems are often evaluated on the task of next-item prediction. The procedure aims to give an answer to the question: "Given the natural sequence of user-item interactions up to time t, can we predict which item the user will interact with at time t+1?". Evaluation results obtained through said methodology are then used as a proxy to predict which system will perform better in an online setting. The online setting, however, poses a subtly different question: "Given the natural sequence of user-item interactions up to time t, can we get the user to interact with a recommended item at time t+1?". From a causal perspective, the system performs an intervention, and we want to measure its effect. Next-item prediction is often used as a fall-back objective when information about interventions and their effects (shown recommendations and whether they received a click) is unavailable. When this type of data is available, however, it can provide great value for reliably estimating online recommender system performance. Through a series of simulated experiments with the RecoGym environment, we show where traditional offline evaluation schemes fall short. Additionally, we show how so-called bandit feedback can be exploited for effective offline evaluation that more accurately reflects online performance.

MLJun 14, 2019
Distributionally Robust Counterfactual Risk Minimization

Louis Faury, Ugo Tanielian, Flavian Vasile et al.

This manuscript introduces the idea of using Distributionally Robust Optimization (DRO) for the Counterfactual Risk Minimization (CRM) problem. Tapping into a rich existing literature, we show that DRO is a principled tool for counterfactual decision making. We also show that well-established solutions to the CRM problem like sample variance penalization schemes are special instances of a more general DRO problem. In this unifying framework, a variety of distributionally robust counterfactual risk estimators can be constructed using various probability distances and divergences as uncertainty measures. We propose the use of Kullback-Leibler divergence as an alternative way to model uncertainty in CRM and derive a new robust counterfactual objective. In our experiments, we show that this approach outperforms the state-of-the-art on four benchmark datasets, validating the relevance of using other uncertainty measures in practical applications.

IRApr 24, 2019
Three Methods for Training on Bandit Feedback

Dmytro Mykhaylov, David Rohde, Flavian Vasile et al.

There are three quite distinct ways to train a machine learning model on recommender system logs. The first method is to model the reward prediction for each possible recommendation to the user, at the scoring time the best recommendation is found by computing an argmax over the personalized recommendations. This method obeys principles such as the conditionality principle and the likelihood principle. A second method is useful when the model does not fit reality and underfits. In this case, we can use the fact that we know the distribution of historical recommendations (concentrated on previously identified good actions with some exploration) to adjust the errors in the fit to be evenly distributed over all actions. Finally, the inverse propensity score can be used to produce an estimate of the decision rules expected performance. The latter two methods violate the conditionality and likelihood principle but are shown to have good performance in certain settings. In this paper we review the literature around this fundamental, yet often overlooked choice and do some experiments using the RecoGym simulation environment.

IRApr 10, 2019
Causal Embeddings for Recommendation: An Extended Abstract

Stephen Bonner, Flavian Vasile

Recommendations are commonly used to modify user's natural behavior, for example, increasing product sales or the time spent on a website. This results in a gap between the ultimate business objective and the classical setup where recommendations are optimized to be coherent with past user behavior. To bridge this gap, we propose a new learning setup for recommendation that optimizes for the Incremental Treatment Effect (ITE) of the policy. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy and propose a new domain adaptation algorithm that learns from logged data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to random exposure. We compare our method against state-of-the-art factorization methods, in addition to new approaches of causal recommendation and show significant improvements.

IRAug 2, 2018
RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising

David Rohde, Stephen Bonner, Travis Dunlop et al.

Recommender Systems are becoming ubiquitous in many settings and take many forms, from product recommendation in e-commerce stores, to query suggestions in search engines, to friend recommendation in social networks. Current research directions which are largely based upon supervised learning from historical data appear to be showing diminishing returns with a lot of practitioners report a discrepancy between improvements in offline metrics for supervised learning and the online performance of the newly proposed models. One possible reason is that we are using the wrong paradigm: when looking at the long-term cycle of collecting historical performance data, creating a new version of the recommendation model, A/B testing it and then rolling it out. We see that there a lot of commonalities with the reinforcement learning (RL) setup, where the agent observes the environment and acts upon it in order to change its state towards better states (states with higher rewards). To this end we introduce RecoGym, an RL environment for recommendation, which is defined by a model of user traffic patterns on e-commerce and the users response to recommendations on the publisher websites. We believe that this is an important step forward for the field of recommendation systems research, that could open up an avenue of collaboration between the recommender systems and reinforcement learning communities and lead to better alignment between offline and online performance metrics.

LGMay 22, 2018
Adversarial Training of Word2Vec for Basket Completion

Ugo Tanielian, Mike Gartrell, Flavian Vasile

In recent years, the Word2Vec model trained with the Negative Sampling loss function has shown state-of-the-art results in a number of machine learning tasks, including language modeling tasks, such as word analogy and word similarity, and in recommendation tasks, through Prod2Vec, an extension that applies to modeling user shopping activity and user preferences. Several methods that aim to improve upon the standard Negative Sampling loss have been proposed. In our paper we pursue more sophisticated Negative Sampling, by leveraging ideas from the field of Generative Adversarial Networks (GANs), and propose Adversarial Negative Sampling. We build upon the recent progress made in stabilizing the training objective of GANs in the discrete data setting, and introduce a new GAN-Word2Vec model.We evaluate our model on the task of basket completion, and show significant improvements in performance over Word2Vec trained using standard loss functions, including Noise Contrastive Estimation and Negative Sampling.

NEMay 22, 2018
Neural Generative Models for Global Optimization with Gradients

Louis Faury, Flavian Vasile, Clément Calauzènes et al.

The aim of global optimization is to find the global optimum of arbitrary classes of functions, possibly highly multimodal ones. In this paper we focus on the subproblem of global optimization for differentiable functions and we propose an Evolutionary Search-inspired solution where we model point search distributions via Generative Neural Networks. This approach enables us to model diverse and complex search distributions based on which we can efficiently explore complicated objective landscapes. In our experiments we show the practical superiority of our algorithm versus classical Evolutionary Search and gradient-based solutions on a benchmark set of multimodal functions, and demonstrate how it can be used to accelerate Bayesian Optimization with Gaussian Processes.

IRMar 28, 2018
Siamese Cookie Embedding Networks for Cross-Device User Matching

Ugo Tanielian, Anne-Marie Tousch, Flavian Vasile

Over the last decade, the number of devices per person has increased substantially. This poses a challenge for cookie-based personalization applications, such as online search and advertising, as it narrows the personalization signal to a single device environment. A key task is to find which cookies belong to the same person to recover a complete cross-device user journey. Recent work on the topic has shown the benefits of using unsupervised embeddings learned on user event sequences. In this paper, we extend this approach to a supervised setting and introduce the Siamese Cookie Embedding Network (SCEmNet), a siamese convolutional architecture that leverages the multi-modal aspect of sequences, and show significant improvement over the state-of-the-art.

LGJan 22, 2018
Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces

Louis Faury, Flavian Vasile

Learning to optimize - the idea that we can learn from data algorithms that optimize a numerical criterion - has recently been at the heart of a growing number of research efforts. One of the most challenging issues within this approach is to learn a policy that is able to optimize over classes of functions that are fairly different from the ones that it was trained on. We propose a novel way of framing learning to optimize as a problem of learning a good navigation policy on a partially observable loss surface. To this end, we develop Rover Descent, a solution that allows us to learn a fairly broad optimization policy from training on a small set of prototypical two-dimensional surfaces that encompasses the classically hard cases such as valleys, plateaus, cliffs and saddles and by using strictly zero-order information. We show that, without having access to gradient or curvature information, we achieve state-of-the-art convergence speed on optimization problems not presented at training time such as the Rosenbrock function and other hard cases in two dimensions. We extend our framework to optimize over high dimensional landscapes, while still handling only two-dimensional local landscape information and show good preliminary results.

IRJun 23, 2017
Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks

Elena Smirnova, Flavian Vasile

Recommendations can greatly benefit from good representations of the user state at recommendation time. Recent approaches that leverage Recurrent Neural Networks (RNNs) for session-based recommendations have shown that Deep Learning models can provide useful user representations for recommendation. However, current RNN modeling approaches summarize the user state by only taking into account the sequence of items that the user has interacted with in the past, without taking into account other essential types of context information such as the associated types of user-item interactions, the time gaps between events and the time of day for each interaction. To address this, we propose a new class of Contextual Recurrent Neural Networks for Recommendation (CRNNs) that can take into account the contextual information both in the input and output layers and modifying the behavior of the RNN by combining the context embedding with the item embedding and more explicitly, in the model dynamics, by parametrizing the hidden unit transitions as a function of context information. We compare our CRNNs approach with RNNs and non-sequential baselines and show good improvements on the next event prediction task.

IRJun 23, 2017
Causal Embeddings for Recommendation

Stephen Bonner, Flavian Vasile

Many current applications use recommendations in order to modify the natural user behavior, such as to increase the number of sales or the time spent on a website. This results in a gap between the final recommendation objective and the classical setup where recommendation candidates are evaluated by their coherence with past user behavior, by predicting either the missing entries in the user-item matrix, or the most likely next event. To bridge this gap, we optimize a recommendation policy for the task of increasing the desired outcome versus the organic user behavior. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy. To this end, we propose a new domain adaptation algorithm that learns from logged data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to random exposure. We compare our method against state-of-the-art factorization methods, in addition to new approaches of causal recommendation and show significant improvements.

IRJun 23, 2017
Specializing Joint Representations for the task of Product Recommendation

Thomas Nedelec, Elena Smirnova, Flavian Vasile

We propose a unified product embedded representation that is optimized for the task of retrieval-based product recommendation. To this end, we introduce a new way to fuse modality-specific product embeddings into a joint product embedding, in order to leverage both product content information, such as textual descriptions and images, and product collaborative filtering signal. By introducing the fusion step at the very end of our architecture, we are able to train each modality separately, allowing us to keep a modular architecture that is preferable in real-world recommendation deployments. We analyze our performance on normal and hard recommendation setups such as cold-start and cross-category recommendations and achieve good performance on a large product shopping dataset.

IRJul 25, 2016
Meta-Prod2Vec - Product Embeddings Using Side-Information for Recommendation

Flavian Vasile, Elena Smirnova, Alexis Conneau

We propose Meta-Prod2vec, a novel method to compute item similarities for recommendation that leverages existing item metadata. Such scenarios are frequently encountered in applications such as content recommendation, ad targeting and web search. Our method leverages past user interactions with items and their attributes to compute low-dimensional embeddings of items. Specifically, the item metadata is in- jected into the model as side information to regularize the item embeddings. We show that the new item representa- tions lead to better performance on recommendation tasks on an open music dataset.

LGMar 11, 2016
Cost-sensitive Learning for Utility Optimization in Online Advertising Auctions

Flavian Vasile, Damien Lefortier, Olivier Chapelle

One of the most challenging problems in computational advertising is the prediction of click-through and conversion rates for bidding in online advertising auctions. An unaddressed problem in previous approaches is the existence of highly non-uniform misprediction costs. While for model evaluation these costs have been taken into account through recently proposed business-aware offline metrics -- such as the Utility metric which measures the impact on advertiser profit -- this is not the case when training the models themselves. In this paper, to bridge the gap, we formally analyze the relationship between optimizing the Utility metric and the log loss, which is considered as one of the state-of-the-art approaches in conversion modeling. Our analysis motivates the idea of weighting the log loss with the business value of the predicted outcome. We present and analyze a new cost weighting scheme and show that significant gains in offline and online performance can be achieved.