Yifan Chen

h-index13

5papers

121citations

Novelty56%

AI Score26

Ranked #163,064 of 194,257 authors (top 84%)#1,723 in IR (top 79%)

5 Papers

4.1LGApr 8, 2019

Scaling Up Collaborative Filtering Data Sets through Randomized Fractal Expansions

Francois Belletti, Karthik Lakshmanan, Walid Krichene et al.

Recommender system research suffers from a disconnect between the size of academic data sets and the scale of industrial production systems. In order to bridge that gap, we propose to generate large-scale user/item interaction data sets by expanding pre-existing public data sets. Our key contribution is a technique that expands user/item incidence matrices matrices to large numbers of rows (users), columns (items), and non-zero values (interactions). The proposed method adapts Kronecker Graph Theory to preserve key higher order statistical properties such as the fat-tailed distribution of user engagements, item popularity, and singular value spectra of user/item interaction matrices. Preserving such properties is key to building large realistic synthetic data sets which in turn can be employed reliably to benchmark recommender systems and the systems employed to train them. We further apply our stochastic expansion algorithm to the binarized MovieLens 20M data set, which comprises 20M interactions between 27K movies and 138K users. The resulting expanded data set has 1.2B ratings, 2.2M users, and 855K items, which can be scaled up or down.

11.3IRJan 23, 2019

Scalable Realistic Recommendation Datasets through Fractal Expansions

Francois Belletti, Karthik Lakshmanan, Walid Krichene et al.

Recommender System research suffers currently from a disconnect between the size of academic data sets and the scale of industrial production systems. In order to bridge that gap we propose to generate more massive user/item interaction data sets by expanding pre-existing public data sets. User/item incidence matrices record interactions between users and items on a given platform as a large sparse matrix whose rows correspond to users and whose columns correspond to items. Our technique expands such matrices to larger numbers of rows (users), columns (items) and non zero values (interactions) while preserving key higher order statistical properties. We adapt the Kronecker Graph Theory to user/item incidence matrices and show that the corresponding fractal expansions preserve the fat-tailed distributions of user engagements, item popularity and singular value spectra of user/item interaction matrices. Preserving such properties is key to building large realistic synthetic data sets which in turn can be employed reliably to benchmark Recommender Systems and the systems employed to train them. We provide algorithms to produce such expansions and apply them to the MovieLens 20 million data set comprising 20 million ratings of 27K movies by 138K users. The resulting expanded data set has 10 billion ratings, 864K items and 2 million users in its smaller version and can be scaled up or down. A larger version features 655 billion ratings, 7 million items and 17 million users.

10.8IRJul 16, 2018

A Collective Variational Autoencoder for Top-$N$ Recommendation with Side Information

Yifan Chen, Maarten de Rijke

Recommender systems have been studied extensively due to their practical use in many real-world scenarios. Despite this, generating effective recommendations with sparse user ratings remains a challenge. Side information associated with items has been widely utilized to address rating sparsity. Existing recommendation models that use side information are linear and, hence, have restricted expressiveness. Deep learning has been used to capture non-linearities by learning deep item representations from side information but as side information is high-dimensional existing deep models tend to have large input dimensionality, which dominates their overall size. This makes them difficult to train, especially with small numbers of inputs. Rather than learning item representations, which is problematic with high-dimensional side information, in this paper, we propose to learn feature representation through deep learning from side information. Learning feature representations, on the other hand, ensures a sufficient number of inputs to train a deep network. To achieve this, we propose to simultaneously recover user ratings and side information, by using a Variational Autoencoder (VAE). Specifically, user ratings and side information are encoded and decoded collectively through the same inference network and generation network. This is possible as both user ratings and side information are data associated with items. To account for the heterogeneity of user rating and side information, the final layer of the generation network follows different distributions depending on the type of information. The proposed model is easy to implement and efficient to optimize and is shown to outperform state-of-the-art top-$N$ recommendation methods that use side information.

4.0IRFeb 6, 2017

Leveraging High-Dimensional Side Information for Top-N Recommendation

Yifan Chen, Xiang Zhao

Top-$N$ recommender systems typically utilize side information to address the problem of data sparsity. As nowadays side information is growing towards high dimensionality, the performances of existing methods deteriorate in terms of both effectiveness and efficiency, which imposes a severe technical challenge. In order to take advantage of high-dimensional side information, we propose in this paper an embedded feature selection method to facilitate top-$N$ recommendation. In particular, we propose to learn feature weights of side information, where zero-valued features are naturally filtered out. We also introduce non-negativity and sparsity to the feature weights, to facilitate feature selection and encourage low-rank structure. Two optimization problems are accordingly put forward, respectively, where the feature selection is tightly or loosely coupled with the learning procedure. Augmented Lagrange Multiplier and Alternating Direction Method are applied to efficiently solve the problems. Experiment results demonstrate the superior recommendation quality of the proposed algorithm to that of the state-of-the-art alternatives.

4.8IRJun 27, 2016

Content-Based Top-N Recommendation using Heterogeneous Relations

Yifan Chen, Xiang Zhao, Junjiao Gan et al.

Top-$N$ recommender systems have been extensively studied. However, the sparsity of user-item activities has not been well resolved. While many hybrid systems were proposed to address the cold-start problem, the profile information has not been sufficiently leveraged. Furthermore, the heterogeneity of profiles between users and items intensifies the challenge. In this paper, we propose a content-based top-$N$ recommender system by learning the global term weights in profiles. To achieve this, we bring in PathSim, which could well measures the node similarity with heterogeneous relations (between users and items). Starting from the original TF-IDF value, the global term weights gradually converge, and eventually reflect both profile and activity information. To facilitate training, the derivative is reformulated into matrix form, which could easily be paralleled. We conduct extensive experiments, which demonstrate the superiority of the proposed method.