IRMay 9, 2022
Towards Feature Selection for Ranking and Classification Exploiting Quantum AnnealersMaurizio Ferrari Dacrema, Fabio Moroni, Riccardo Nembrini et al.
Feature selection is a common step in many ranking, classification, or prediction tasks and serves many purposes. By removing redundant or noisy features, the accuracy of ranking or classification can be improved and the computational cost of the subsequent learning steps can be reduced. However, feature selection can be itself a computationally expensive process. While for decades confined to theoretical algorithmic papers, quantum computing is now becoming a viable tool to tackle realistic problems, in particular special-purpose solvers based on the Quantum Annealing paradigm. This paper aims to explore the feasibility of using currently available quantum computing architectures to solve some quadratic feature selection algorithms for both ranking and classification. The experimental analysis includes 15 state-of-the-art datasets. The effectiveness obtained with quantum computing hardware is comparable to that of classical solvers, indicating that quantum computers are now reliable enough to tackle interesting problems. In terms of scalability, current generation quantum computers are able to provide a limited speedup over certain classical algorithms and hybrid quantum-classical strategies show lower computational cost for problems of more than a thousand features.
IRNov 5, 2022
Feature Selection for Classification with QAOAGloria Turati, Maurizio Ferrari Dacrema, Paolo Cremonesi
Feature selection is of great importance in Machine Learning, where it can be used to reduce the dimensionality of classification, ranking and prediction problems. The removal of redundant and noisy features can improve both the accuracy and scalability of the trained models. However, feature selection is a computationally expensive task with a solution space that grows combinatorically. In this work, we consider in particular a quadratic feature selection problem that can be tackled with the Quantum Approximate Optimization Algorithm (QAOA), already employed in combinatorial optimization. First we represent the feature selection problem with the QUBO formulation, which is then mapped to an Ising spin Hamiltonian. Then we apply QAOA with the goal of finding the ground state of this Hamiltonian, which corresponds to the optimal selection of features. In our experiments, we consider seven different real-world datasets with dimensionality up to 21 and run QAOA on both a quantum simulator and, for small datasets, the 7-qubit IBM (ibm-perth) quantum computer. We use the set of selected features to train a classification model and evaluate its accuracy. Our analysis shows that it is possible to tackle the feature selection problem with QAOA and that currently available quantum devices can be used effectively. Future studies could test a wider range of classification models as well as improve the effectiveness of QAOA by exploring better performing optimizers for its classical step.
IRAug 15, 2023
Impression-Aware Recommender SystemsFernando B. Pérez Maurera, Maurizio Ferrari Dacrema, Pablo Castells et al.
Novel data sources bring new opportunities to improve the quality of recommender systems and serve as a catalyst for the creation of new paradigms on personalized recommendations. Impressions are a novel data source containing the items shown to users on their screens. Past research focused on providing personalized recommendations using interactions, and occasionally using impressions when such a data source was available. Interest in impressions has increased due to their potential to provide more accurate recommendations. Despite this increased interest, research in recommender systems using impressions is still dispersed. Many works have distinct interpretations of impressions and use impressions in recommender systems in numerous different manners. To unify those interpretations into a single framework, we present a systematic literature review on recommender systems using impressions, focusing on three fundamental perspectives: recommendation models, datasets, and evaluation methodologies. We define a theoretical framework to delimit recommender systems using impressions and a novel paradigm for personalized recommendations, called impression-aware recommender systems. We propose a classification system for recommenders in this paradigm, which we use to categorize the recommendation models, datasets, and evaluation methodologies used in past research. Lastly, we identify open questions and future directions, highlighting missing aspects in the reviewed literature.
QUANT-PHAug 3, 2023
Benchmarking Adaptative Variational Quantum Algorithms on QUBO InstancesGloria Turati, Maurizio Ferrari Dacrema, Paolo Cremonesi
In recent years, Variational Quantum Algorithms (VQAs) have emerged as a promising approach for solving optimization problems on quantum computers in the NISQ era. However, one limitation of VQAs is their reliance on fixed-structure circuits, which may not be taylored for specific problems or hardware configurations. A leading strategy to address this issue are Adaptative VQAs, which dynamically modify the circuit structure by adding and removing gates, and optimize their parameters during the training. Several Adaptative VQAs, based on heuristics such as circuit shallowness, entanglement capability and hardware compatibility, have already been proposed in the literature, but there is still lack of a systematic comparison between the different methods. In this paper, we aim to fill this gap by analyzing three Adaptative VQAs: Evolutionary Variational Quantum Eigensolver (EVQE), Variable Ansatz (VAns), already proposed in the literature, and Random Adapt-VQE (RA-VQE), a random approach we introduce as a baseline. In order to compare these algorithms to traditional VQAs, we also include the Quantum Approximate Optimization Algorithm (QAOA) in our analysis. We apply these algorithms to QUBO problems and study their performance by examining the quality of the solutions found and the computational times required. Additionally, we investigate how the choice of the hyperparameters can impact the overall performance of the algorithms, highlighting the importance of selecting an appropriate methodology for hyperparameter tuning. Our analysis sets benchmarks for Adaptative VQAs designed for near-term quantum devices and provides valuable insights to guide future research in this area.
QUANT-PHSep 9, 2024
Reinforcement Learning for Variational Quantum Circuits DesignSimone Foderà, Gloria Turati, Riccardo Nembrini et al.
Variational Quantum Algorithms have emerged as promising tools for solving optimization problems on quantum computers. These algorithms leverage a parametric quantum circuit called ansatz, where its parameters are adjusted by a classical optimizer with the goal of optimizing a certain cost function. However, a significant challenge lies in designing effective circuits for addressing specific problems. In this study, we leverage the powerful and flexible Reinforcement Learning paradigm to train an agent capable of autonomously generating quantum circuits that can be used as ansatzes in variational algorithms to solve optimization problems. The agent is trained on diverse problem instances, including Maximum Cut, Maximum Clique and Minimum Vertex Cover, built from different graph topologies and sizes. Our analysis of the circuits generated by the agent and the corresponding solutions shows that the proposed method is able to generate effective ansatzes. While our goal is not to propose any new specific ansatz, we observe how the agent has discovered a novel family of ansatzes effective for Maximum Cut problems, which we call $R_{yz}$-connected. We study the characteristics of one of these ansatzes by comparing it against state-of-the-art quantum algorithms across instances of varying graph topologies, sizes, and problem types. Our results indicate that the $R_{yz}$-connected circuit achieves high approximation ratios for Maximum Cut problems, further validating our proposed agent. In conclusion, our study highlights the potential of Reinforcement Learning techniques in assisting researchers to design effective quantum circuits which could have applications in a wide number of tasks.
QUANT-PHAug 1, 2024
Analyzing the Effectiveness of Quantum Annealing with Meta-LearningRiccardo Pellini, Maurizio Ferrari Dacrema
The field of Quantum Computing has gathered significant popularity in recent years and a large number of papers have studied its effectiveness in tackling many tasks. We focus in particular on Quantum Annealing (QA), a meta-heuristic solver for Quadratic Unconstrained Binary Optimization (QUBO) problems. It is known that the effectiveness of QA is dependent on the task itself, as is the case for classical solvers, but there is not yet a clear understanding of which are the characteristics of a problem that makes it difficult to solve with QA. In this work, we propose a new methodology to study the effectiveness of QA based on meta-learning models. To do so, we first build a dataset composed of more than five thousand instances of ten different optimization problems. We define a set of more than a hundred features to describe their characteristics, and solve them with both QA and three classical solvers. We publish this dataset online for future research. Then, we train multiple meta-models to predict whether QA would solve that instance effectively and use them to probe which are the features with the strongest impact on the effectiveness of QA. Our results indicate that it is possible to accurately predict the effectiveness of QA, validating our methodology. Furthermore, we observe that the distribution of the problem coefficients representing the bias and coupling terms is very informative to identify the probability of finding good solutions, while the density of these coefficients alone is not enough. The methodology we propose allows to open new research directions to further our understanding of the effectiveness of QA, by probing specific dimensions or by developing new QUBO formulations that are better suited for the particular nature of QA. Furthermore, the proposed methodology is flexible and can be extended or used to study other quantum or classical solvers.
QUANT-PHAug 5, 2024
Adaptive Learning for Quantum Linear RegressionCostantino Carugno, Maurizio Ferrari Dacrema, Paolo Cremonesi
The recent availability of quantum annealers as cloud-based services has enabled new ways to handle machine learning problems, and several relevant algorithms have been adapted to run on these devices. In a recent work, linear regression was formulated as a quadratic binary optimization problem that can be solved via quantum annealing. Although this approach promises a computational time advantage for large datasets, the quality of the solution is limited by the necessary use of a precision vector, used to approximate the real-numbered regression coefficients in the quantum formulation. In this work, we focus on the practical challenge of improving the precision vector encoding: instead of setting an array of generic values equal for all coefficients, we allow each one to be expressed by its specific precision, which is tuned with a simple adaptive algorithm. This approach is evaluated on synthetic datasets of increasing size, and linear regression is solved using the D-Wave Advantage quantum annealer, as well as classical solvers. To the best of our knowledge, this is the largest dataset ever evaluated for linear regression on a quantum annealer. The results show that our formulation is able to deliver improved solution quality in all instances, and could better exploit the potential of current quantum devices.
IRAug 3, 2020Code
ContentWise Impressions: An Industrial Dataset with Impressions IncludedFernando Benjamín Pérez Maurera, Maurizio Ferrari Dacrema, Lorenzo Saule et al.
In this article, we introduce the ContentWise Impressions dataset, a collection of implicit interactions and impressions of movies and TV series from an Over-The-Top media service, which delivers its media contents over the Internet. The dataset is distinguished from other already available multimedia recommendation datasets by the availability of impressions, i.e., the recommendations shown to the user, its size, and by being open-source. We describe the data collection process, the preprocessing applied, its characteristics, and statistics when compared to other commonly used datasets. We also highlight several possible use cases and research questions that can benefit from the availability of user impressions in an open-source dataset. Furthermore, we release software tools to load and split the data, as well as examples of how to use both user interactions and impressions in several common recommendation algorithms.
IRJul 16, 2019Code
Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation ApproachesMaurizio Ferrari Dacrema, Paolo Cremonesi, Dietmar Jannach
Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today's research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models. In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area. Source code of our experiments and full results are available at: https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation.
IRMay 14, 2025
Diffusion Recommender Models and the Illusion of Progress: A Concerning Study of Reproducibility and a Conceptual MismatchMichael Benigni, Maurizio Ferrari Dacrema, Dietmar Jannach
Countless new machine learning models are published every year and are reported to significantly advance the state-of-the-art in \emph{top-n} recommendation. However, earlier reproducibility studies indicate that progress in this area may be quite limited. Specifically, various widespread methodological issues, e.g., comparisons with untuned baseline models, have led to an \emph{illusion of progress}. In this work, our goal is to examine whether these problems persist in today's research. To this end, we aim to reproduce the latest advancements reported from applying modern Denoising Diffusion Probabilistic Models to recommender systems, focusing on four models published at the top-ranked SIGIR conference in 2023 and 2024. Our findings are concerning, revealing persistent methodological problems. Alarmingly, through experiments, we find that the latest recommendation techniques based on diffusion models, despite their computational complexity and substantial carbon footprint, are consistently outperformed by simpler existing models. Furthermore, we identify key mismatches between the characteristics of diffusion models and those of the traditional \emph{top-n} recommendation task, raising doubts about their suitability for recommendation. We also note that, in the papers we analyze, the generative capabilities of these models are constrained to a minimum. Overall, our results and continued methodological issues call for greater scientific rigor and a disruptive change in the research and publication culture in this area.
IRMar 10, 2025
Reproducibility and Artifact Consistency of the SIGIR 2022 Recommender Systems Papers Based on Message PassingMaurizio Ferrari Dacrema, Michael Benigni, Nicola Ferro
Graph-based techniques relying on neural networks and embeddings have gained attention as a way to develop Recommender Systems (RS) with several papers on the topic presented at SIGIR 2022 and 2023. Given the importance of ensuring that published research is methodologically sound and reproducible, in this paper we analyze 10 graph-based RS papers, most of which were published at SIGIR 2022, and assess their impact on subsequent work published in SIGIR 2023. Our analysis reveals several critical points that require attention: (i) the prevalence of bad practices, such as erroneous data splits or information leakage between training and testing data, which call into question the validity of the results; (ii) frequent inconsistencies between the provided artifacts (source code and data) and their descriptions in the paper, causing uncertainty about what is actually being evaluated; and (iii) the preference for new or complex baselines that are weaker compared to simpler ones, creating the impression of continuous improvement even when, particularly for the Amazon-Book dataset, the state-of-the-art has significantly worsened. Due to these issues, we are unable to confirm the claims made in most of the papers we examined and attempted to reproduce.
QUANT-PHJul 21, 2025
Minor Embedding for Quantum Annealing with Reinforcement LearningRiccardo Nembrini, Maurizio Ferrari Dacrema, Paolo Cremonesi
Quantum Annealing (QA) is a quantum computing paradigm for solving combinatorial optimization problems formulated as Quadratic Unconstrained Binary Optimization (QUBO) problems. An essential step in QA is minor embedding, which maps the problem graph onto the sparse topology of the quantum processor. This process is computationally expensive and scales poorly with increasing problem size and hardware complexity. Existing heuristics are often developed for specific problem graphs or hardware topologies and are difficult to generalize. Reinforcement Learning (RL) offers a promising alternative by treating minor embedding as a sequential decision-making problem, where an agent learns to construct minor embeddings by iteratively mapping the problem variables to the hardware qubits. We propose a RL-based approach to minor embedding using a Proximal Policy Optimization agent, testing its ability to embed both fully connected and randomly generated problem graphs on two hardware topologies, Chimera and Zephyr. The results show that our agent consistently produces valid minor embeddings, with reasonably efficient number of qubits, in particular on the more modern Zephyr topology. Our proposed approach is also able to scale to moderate problem sizes and adapts well to different graph structures, highlighting RL's potential as a flexible and general-purpose framework for minor embedding in QA.
QUANT-PHJul 21, 2025
Automated Design of Structured Variational Quantum Circuits with Reinforcement LearningGloria Turati, Simone Foderà, Riccardo Nembrini et al.
Variational Quantum Algorithms (VQAs) are among the most promising approaches for leveraging near-term quantum hardware, yet their effectiveness strongly depends on the design of the underlying circuit ansatz, which is typically constructed with heuristic methods. In this work, we represent the synthesis of variational quantum circuits as a sequential decision-making problem, where gates are added iteratively in order to optimize an objective function, and we introduce two reinforcement learning-based methods, RLVQC Global and RLVQC Block, tailored to combinatorial optimization problems. RLVQC Block creates ansatzes that generalize the Quantum Approximate Optimization Algorithm (QAOA), by discovering a two-qubits block that is applied to all the interacting qubit pairs. While RLVQC Global further generalizes the ansatz and adds gates unconstrained by the structure of the interacting qubits. Both methods adopt the Proximal Policy Optimization (PPO) algorithm and use empirical measurement outcomes as state observations to guide the agent. We evaluate the proposed methods on a broad set of QUBO instances derived from classical graph-based optimization problems. Our results show that both RLVQC methods exhibit strong results with RLVQC Block consistently outperforming QAOA and generally surpassing RLVQC Global. While RLVQC Block produces circuits with depth comparable to QAOA, the Global variant is instead able to find significantly shorter ones. These findings suggest that reinforcement learning methods can be an effective tool to discover new ansatz structures tailored for specific problems and that the most effective circuit design strategy lies between rigid predefined architectures and completely unconstrained ones, offering a favourable trade-off between structure and adaptability.
LGJun 26, 2024
Automated Off-Policy Estimator Selection via Supervised LearningNicolò Felicioni, Michael Benigni, Maurizio Ferrari Dacrema
The Off-Policy Evaluation (OPE) problem consists of evaluating the performance of counterfactual policies with data collected by another one. To solve the OPE problem, we resort to estimators, which aim to estimate in the most accurate way possible the performance that the counterfactual policies would have had if they were deployed in place of the logging policy. In the literature, several estimators have been developed, all with different characteristics and theoretical guarantees. Therefore, there is no dominant estimator and each estimator may be the best for different OPE problems, depending on the characteristics of the dataset at hand. Although the selection of the estimator is a crucial choice for an accurate OPE, this problem has been widely overlooked in the literature. We propose an automated data-driven OPE estimator selection method based on supervised learning. In particular, the core idea we propose in this paper is to create several synthetic OPE tasks and use a machine learning model trained to predict the best estimator for those synthetic tasks. We empirically show how our method is able to perform a better estimator selection compared to a baseline method on several real-world datasets, with a computational cost significantly lower than the one of the baseline.
IRJan 5, 2022
An Evaluation Study of Generative Adversarial Networks for Collaborative FilteringFernando Benjamín Pérez Maurera, Maurizio Ferrari Dacrema, Paolo Cremonesi
This work explores the reproducibility of CFGAN. CFGAN and its family of models (TagRec, MTPR, and CRGAN) learn to generate personalized and fake-but-realistic rankings of preferences for top-N recommendations by using previous interactions. This work successfully replicates the results published in the original paper and discusses the impact of certain differences between the CFGAN framework and the model used in the original evaluation. The absence of random noise and the use of real user profiles as condition vectors leaves the generator prone to learn a degenerate solution in which the output vector is identical to the input vector, therefore, behaving essentially as a simple autoencoder. The work further expands the experimental analysis comparing CFGAN against a selection of simple and well-known properly optimized baselines, observing that CFGAN is not consistently competitive against them despite its high computational cost. To ensure the reproducibility of these analyses, this work describes the experimental methodology and publishes all datasets and source code.
IROct 11, 2021
Feature Selection for Recommender Systems with Quantum ComputingRiccardo Nembrini, Maurizio Ferrari Dacrema, Paolo Cremonesi
The promise of quantum computing to open new unexplored possibilities in several scientific fields has been long discussed, but until recently the lack of a functional quantum computer has confined this discussion mostly to theoretical algorithmic papers. It was only in the last few years that small but functional quantum computers have become available to the broader research community. One paradigm in particular, quantum annealing, can be used to sample optimal solutions for a number of NP-hard optimization problems represented with classical operations research tools, providing an easy access to the potential of this emerging technology. One of the tasks that most naturally fits in this mathematical formulation is feature selection. In this paper, we investigate how to design a hybrid feature selection algorithm for recommender systems that leverages the domain knowledge and behavior hidden in the user interactions data. We represent the feature selection as an optimization problem and solve it on a real quantum computer, provided by D-Wave. The results indicate that the proposed approach is effective in selecting a limited set of important features and that quantum computers are becoming powerful enough to enter the wider realm of applied science.
IRMay 14, 2021
Measuring the User Satisfaction in a Recommendation Interface with Multiple CarouselsNicolò Felicioni, Maurizio Ferrari Dacrema, Paolo Cremonesi
It is common for video-on-demand and music streaming services to adopt a user interface composed of several recommendation lists, i.e. widgets or swipeable carousels, each generated according to a specific criterion or algorithm (e.g. most recent, top popular, recommended for you, editors' choice, etc.). Selecting the appropriate combination of carousel has significant impact on user satisfaction. A crucial aspect of this user interface is that to measure the relevance a new carousel for the user it is not sufficient to account solely for its individual quality. Instead, it should be considered that other carousels will already be present in the interface. This is not considered by traditional evaluation protocols for recommenders systems, in which each carousel is evaluated in isolation, regardless of (i) which other carousels are displayed to the user and (ii) the relative position of the carousel with respect to other carousels. Hence, we propose a two-dimensional evaluation protocol for a carousel setting that will measure the quality of a recommendation carousel based on how much it improves upon the quality of an already available set of carousels. Our evaluation protocol takes into account also the position bias, i.e. users do not explore the carousels sequentially, but rather concentrate on the top-left corner of the screen. We report experiments on the movie domain and notice that under a carousel setting the definition of which criteria has to be preferred to generate a list of recommended items changes with respect to what is commonly understood.
IRMay 13, 2021
A Methodology for the Offline Evaluation of Recommender Systems in a User Interface with Multiple CarouselsNicolò Felicioni, Maurizio Ferrari Dacrema, Paolo Cremonesi
Many video-on-demand and music streaming services provide the user with a page consisting of several recommendation lists, i.e. widgets or swipeable carousels, each built with a specific criterion (e.g. most recent, TV series, etc.). Finding efficient strategies to select which carousels to display is an active research topic of great industrial interest. In this setting, the overall quality of the recommendations of a new algorithm cannot be assessed by measuring solely its individual recommendation quality. Rather, it should be evaluated in a context where other recommendation lists are already available, to account for how they complement each other. This is not considered by traditional offline evaluation protocols. Hence, we propose an offline evaluation protocol for a carousel setting in which the recommendation quality of a model is measured by how much it improves upon that of an already available set of carousels. We report experiments on publicly available datasets on the movie domain and notice that under a carousel setting the ranking of the algorithms change. In particular, when a SLIM carousel is available, matrix factorization models tend to be preferred, while item-based models are penalized. We also propose to extend ranking metrics to the two-dimensional carousel layout in order to account for a known position bias, i.e. users will not explore the lists sequentially, but rather concentrate on the top-left corner of the screen.
IROct 13, 2020
Artist-driven layering and user's behaviour impact on recommendations in a playlist continuation scenarioSebastiano Antenucci, Simone Boglio, Emanuele Chioso et al.
In this paper we provide an overview of the approach we used as team Creamy Fireflies for the ACM RecSys Challenge 2018. The competition, organized by Spotify, focuses on the problem of playlist continuation, that is suggesting which tracks the user may add to an existing playlist. The challenge addresses this issue in many use cases, from playlist cold start to playlists already composed by up to a hundred tracks. Our team proposes a solution based on a few well known models both content based and collaborative, whose predictions are aggregated via an ensembling step. Moreover by analyzing the underlying structure of the data, we propose a series of boosts to be applied on top of the final predictions and improve the recommendation quality. The proposed approach leverages well-known algorithms and is able to offer a high recommendation quality while requiring a limited amount of computational resources.
IRJul 23, 2020
Critically Examining the Claimed Value of Convolutions over User-Item Embedding Maps for Recommender SystemsMaurizio Ferrari Dacrema, Federico Parroni, Paolo Cremonesi et al.
In recent years, algorithm research in the area of recommender systems has shifted from matrix factorization techniques and their latent factor models to neural approaches. However, given the proven power of latent factor models, some newer neural approaches incorporate them within more complex network architectures. One specific idea, recently put forward by several researchers, is to consider potential correlations between the latent factors, i.e., embeddings, by applying convolutions over the user-item interaction map. However, contrary to what is claimed in these articles, such interaction maps do not share the properties of images where Convolutional Neural Networks (CNNs) are particularly useful. In this work, we show through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations, as argued in the original papers. Moreover, additional performance evaluations show that all of the examined recent CNN-based models are outperformed by existing non-neural machine learning techniques or traditional nearest-neighbor approaches. On a more general level, our work points to major methodological issues in recommender systems research.
IRNov 18, 2019
A Troubling Analysis of Reproducibility and Progress in Recommender Systems ResearchMaurizio Ferrari Dacrema, Simone Boglio, Paolo Cremonesi et al.
The design of algorithms that generate personalized ranked item lists is a central topic of research in the field of recommender systems. In the past few years, in particular, approaches based on deep learning (neural) techniques have become dominant in the literature. For all of them, substantial progress over the state-of-the-art is claimed. However, indications exist of certain problems in today's research practice, e.g., with respect to the choice and optimization of the baselines used for comparison, raising questions about the published claims. In order to obtain a better understanding of the actual progress, we have tried to reproduce recent results in the area of neural recommendation approaches based on collaborative filtering. The worrying outcome of the analysis of these recent works-all were published at prestigious scientific conferences between 2015 and 2018-is that 11 out of the 12 reproducible neural approaches can be outperformed by conceptually simple methods, e.g., based on the nearest-neighbor heuristics. None of the computationally complex neural methods was actually consistently better than already existing learning-based techniques, e.g., using matrix factorization or linear models. In our analysis, we discuss common issues in today's research practice, which, despite the many papers that are published on the topic, have apparently led the field to a certain level of stagnation.
IRAug 29, 2019
Towards Evaluating User Profiling Methods Based on Explicit Ratings on Item FeaturesLuca Luciano Costanzo, Yashar Deldjoo, Maurizio Ferrari Dacrema et al.
In order to improve the accuracy of recommendations, many recommender systems nowadays use side information beyond the user rating matrix, such as item content. These systems build user profiles as estimates of users' interest on content (e.g., movie genre, director or cast) and then evaluate the performance of the recommender system as a whole e.g., by their ability to recommend relevant and novel items to the target user. The user profile modelling stage, which is a key stage in content-driven RS is barely properly evaluated due to the lack of publicly available datasets that contain user preferences on content features of items. To raise awareness of this fact, we investigate differences between explicit user preferences and implicit user profiles. We create a dataset of explicit preferences towards content features of movies, which we release publicly. We then compare the collected explicit user feature preferences and implicit user profiles built via state-of-the-art user profiling models. Our results show a maximum average pairwise cosine similarity of 58.07\% between the explicit feature preferences and the implicit user profiles modelled by the best investigated profiling method and considering movies' genres only. For actors and directors, this maximum similarity is only 9.13\% and 17.24\%, respectively. This low similarity between explicit and implicit preference models encourages a more in-depth study to investigate and improve this important user profile modelling step, which will eventually translate into better recommendations.
IRNov 5, 2018
Deriving item features relevance from collaborative domain knowledgeMaurizio Ferrari Dacrema, Alberto Gasparin, Paolo Cremonesi
An Item based recommender system works by computing a similarity between items, which can exploit past user interactions (collaborative filtering) or item features (content based filtering). Collaborative algorithms have been proven to achieve better recommendation quality then content based algorithms in a variety of scenarios, being more effective in modeling user behaviour. However, they can not be applied when items have no interactions at all, i.e. cold start items. Content based algorithms, which are applicable to cold start items, often require a lot of feature engineering in order to generate useful recommendations. This issue is specifically relevant as the content descriptors become large and heterogeneous. The focus of this paper is on how to use a collaborative models domain-specific knowledge to build a wrapper feature weighting method which embeds collaborative knowledge in a content based algorithm. We present a comparative study for different state of the art algorithms and present a more general model. This machine learning approach to feature weighting shows promising results and high flexibility.
LGAug 31, 2018
A novel graph-based model for hybrid recommendations in cold-start scenariosCesare Bernardis, Maurizio Ferrari Dacrema, Paolo Cremonesi
Cold-start is a very common and still open problem in the Recommender Systems literature. Since cold start items do not have any interaction, collaborative algorithms are not applicable. One of the main strategies is to use pure or hybrid content-based approaches, which usually yield to lower recommendation quality than collaborative ones. Some techniques to optimize performance of this type of approaches have been studied in recent past. One of them is called feature weighting, which assigns to every feature a real value, called weight, that estimates its importance. Statistical techniques for feature weighting commonly used in Information Retrieval, like TF-IDF, have been adapted for Recommender Systems, but they often do not provide sufficient quality improvements. More recent approaches, FBSM and LFW, estimate weights by leveraging collaborative information via machine learning, in order to learn the importance of a feature based on other users opinions. This type of models have shown promising results compared to classic statistical analyzes cited previously. We propose a novel graph, feature-based machine learning model to face the cold-start item scenario, learning the relevance of features from probabilities of item-based collaborative filtering algorithms.
IRAug 31, 2018
Eigenvalue analogy for confidence estimation in item-based recommender systemsMaurizio Ferrari Dacrema, Paolo Cremonesi
Item-item collaborative filtering (CF) models are a well known and studied family of recommender systems, however current literature does not provide any theoretical explanation of the conditions under which item-based recommendations will succeed or fail. We investigate the existence of an ideal item-based CF method able to make perfect recommendations. This CF model is formalized as an eigenvalue problem, where estimated ratings are equivalent to the true (unknown) ratings multiplied by a user-specific eigenvalue of the similarity matrix. Preliminary experiments show that the magnitude of the eigenvalue is proportional to the accuracy of recommendations for that user and therefore it can provide reliable measure of confidence.