Elena Smirnova

8papers

618citations

Novelty45%

AI Score25

Ranked #173,947 of 205,806 authors (top 85%)#1,726 in IR (top 78%)

8 Papers

MLSep 20, 2019

On the Convergence of Approximate and Regularized Policy Iteration Schemes

Elena Smirnova, Elvis Dohmatob

Entropy regularized algorithms such as Soft Q-learning and Soft Actor-Critic, recently showed state-of-the-art performance on a number of challenging reinforcement learning (RL) tasks. The regularized formulation modifies the standard RL objective and thus generally converges to a policy different from the optimal greedy policy of the original RL problem. Practically, it is important to control the sub-optimality of the regularized optimal policy. In this paper, we establish sufficient conditions for convergence of a large class of regularized dynamic programming algorithms, unified under regularized modified policy iteration (MPI) and conservative value iteration (VI) schemes. We provide explicit convergence rates to the optimality depending on the decrease rate of the regularization parameter. Our experiments show that the empirical error closely follows the established theoretical convergence rates. In addition to optimality, we demonstrate two desirable behaviours of the regularized algorithms even in the absence of approximations: robustness to stochasticity of environment and safety of trajectories induced by the policy iterates.

MLJun 14, 2019

Distributionally Robust Counterfactual Risk Minimization

Louis Faury, Ugo Tanielian, Flavian Vasile et al.

This manuscript introduces the idea of using Distributionally Robust Optimization (DRO) for the Counterfactual Risk Minimization (CRM) problem. Tapping into a rich existing literature, we show that DRO is a principled tool for counterfactual decision making. We also show that well-established solutions to the CRM problem like sample variance penalization schemes are special instances of a more general DRO problem. In this unifying framework, a variety of distributionally robust counterfactual risk estimators can be constructed using various probability distances and divergences as uncertainty measures. We propose the use of Kullback-Leibler divergence as an alternative way to model uncertainty in CRM and derive a new robust counterfactual objective. In our experiments, we show that this approach outperforms the state-of-the-art on four benchmark datasets, validating the relevance of using other uncertainty measures in practical applications.

MLFeb 23, 2019

Distributionally Robust Reinforcement Learning

Elena Smirnova, Elvis Dohmatob, Jérémie Mary

Real-world applications require RL algorithms to act safely. During learning process, it is likely that the agent executes sub-optimal actions that may lead to unsafe/poor states of the system. Exploration is particularly brittle in high-dimensional state/action space due to increased number of low-performing actions. In this work, we consider risk-averse exploration in approximate RL setting. To ensure safety during learning, we propose the distributionally robust policy iteration scheme that provides lower bound guarantee on state-values. Our approach induces a dynamic level of risk to prevent poor decisions and yet preserves the convergence to the optimal policy. Our formulation results in a efficient algorithm that accounts for a simple re-weighting of policy actions in the standard policy iteration scheme. We extend our approach to continuous state/action space and present a practical algorithm, distributionally robust soft actor-critic, that implements a different exploration strategy: it acts conservatively at short-term and it explores optimistically in a long-run. We provide promising experimental results on continuous control tasks.

IRSep 7, 2018

Action-conditional Sequence Modeling for Recommendation

Elena Smirnova

In many online applications interactions between a user and a web-service are organized in a sequential way, e.g., user browsing an e-commerce website. In this setting, recommendation system acts throughout user navigation by showing items. Previous works have addressed this recommendation setup through the task of predicting the next item user will interact with. In particular, Recurrent Neural Networks (RNNs) has been shown to achieve substantial improvements over collaborative filtering baselines. In this paper, we consider interactions triggered by the recommendations of deployed recommender system in addition to browsing behavior. Indeed, it is reported that in online services interactions with recommendations represent up to 30\% of total interactions. Moreover, in practice, recommender system can greatly influence user behavior by promoting specific items. In this paper, we extend the RNN modeling framework by taking into account user interaction with recommended items. We propose and evaluate RNN architectures that consist of the recommendation action module and the state-action fusion module. Using real-world large-scale datasets we demonstrate improved performance on the next item prediction task compared to the baselines.

IRJul 23, 2018

Recurrent Neural Networks for Long and Short-Term Sequential Recommendation

Kiewan Villatel, Elena Smirnova, Jérémie Mary et al.

Recommender systems objectives can be broadly characterized as modeling user preferences over short-or long-term time horizon. A large body of previous research studied long-term recommendation through dimensionality reduction techniques applied to the historical user-item interactions. A recently introduced session-based recommendation setting highlighted the importance of modeling short-term user preferences. In this task, Recurrent Neural Networks (RNN) have shown to be successful at capturing the nuances of user's interactions within a short time window. In this paper, we evaluate RNN-based models on both short-term and long-term recommendation tasks. Our experimental results suggest that RNNs are capable of predicting immediate as well as distant user interactions. We also find the best performing configuration to be a stacked RNN with layer normalization and tied item embeddings.

IRJun 23, 2017

Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks

Elena Smirnova, Flavian Vasile

Recommendations can greatly benefit from good representations of the user state at recommendation time. Recent approaches that leverage Recurrent Neural Networks (RNNs) for session-based recommendations have shown that Deep Learning models can provide useful user representations for recommendation. However, current RNN modeling approaches summarize the user state by only taking into account the sequence of items that the user has interacted with in the past, without taking into account other essential types of context information such as the associated types of user-item interactions, the time gaps between events and the time of day for each interaction. To address this, we propose a new class of Contextual Recurrent Neural Networks for Recommendation (CRNNs) that can take into account the contextual information both in the input and output layers and modifying the behavior of the RNN by combining the context embedding with the item embedding and more explicitly, in the model dynamics, by parametrizing the hidden unit transitions as a function of context information. We compare our CRNNs approach with RNNs and non-sequential baselines and show good improvements on the next event prediction task.

IRJun 23, 2017

Specializing Joint Representations for the task of Product Recommendation

Thomas Nedelec, Elena Smirnova, Flavian Vasile

We propose a unified product embedded representation that is optimized for the task of retrieval-based product recommendation. To this end, we introduce a new way to fuse modality-specific product embeddings into a joint product embedding, in order to leverage both product content information, such as textual descriptions and images, and product collaborative filtering signal. By introducing the fusion step at the very end of our architecture, we are able to train each modality separately, allowing us to keep a modular architecture that is preferable in real-world recommendation deployments. We analyze our performance on normal and hard recommendation setups such as cold-start and cross-category recommendations and achieve good performance on a large product shopping dataset.

IRJul 25, 2016

Meta-Prod2Vec - Product Embeddings Using Side-Information for Recommendation

Flavian Vasile, Elena Smirnova, Alexis Conneau

We propose Meta-Prod2vec, a novel method to compute item similarities for recommendation that leverages existing item metadata. Such scenarios are frequently encountered in applications such as content recommendation, ad targeting and web search. Our method leverages past user interactions with items and their attributes to compute low-dimensional embeddings of items. Specifically, the item metadata is in- jected into the model as side information to regularize the item embeddings. We show that the new item representa- tions lead to better performance on recommendation tasks on an open music dataset.