Zi-Ke Zhang

IR
h-index2
15papers
844citations
Novelty41%
AI Score27

15 Papers

LGJan 30, 2024
Data organization limits the predictability of binary classification

Fei Jing, Zi-Ke Zhang, Yi-Cheng Zhang et al.

The structure of data organization is widely recognized as having a substantial influence on the efficacy of machine learning algorithms, particularly in binary classification tasks. Our research provides a theoretical framework suggesting that the maximum potential of binary classifiers on a given dataset is primarily constrained by the inherent qualities of the data. Through both theoretical reasoning and empirical examination, we employed standard objective functions, evaluative metrics, and binary classifiers to arrive at two principal conclusions. Firstly, we show that the theoretical upper bound of binary classification performance on actual datasets can be theoretically attained. This upper boundary represents a calculable equilibrium between the learning loss and the metric of evaluation. Secondly, we have computed the precise upper bounds for three commonly used evaluation metrics, uncovering a fundamental uniformity with our overarching thesis: the upper bound is intricately linked to the dataset's characteristics, independent of the classifier in use. Additionally, our subsequent analysis uncovers a detailed relationship between the upper limit of performance and the level of class overlap within the binary classification data. This relationship is instrumental for pinpointing the most effective feature subsets for use in feature engineering.

SOC-PHAug 3, 2021
Effective Model Integration Algorithm for Improving Link and Sign Prediction in Complex Networks

Chuang Liu, Shimin Yu, Ying Huang et al.

Link and sign prediction in complex networks bring great help to decision-making and recommender systems, such as in predicting potential relationships or relative status levels. Many previous studies focused on designing the special algorithms to perform either link prediction or sign prediction. In this work, we propose an effective model integration algorithm consisting of network embedding, network feature engineering, and an integrated classifier, which can perform the link and sign prediction in the same framework. Network embedding can accurately represent the characteristics of topological structures and cooperate with the powerful network feature engineering and integrated classifier can achieve better prediction. Experiments on several datasets show that the proposed model can achieve state-of-the-art or competitive performance for both link and sign prediction in spite of its generality. Interestingly, we find that using only very low network embedding dimension can generate high prediction performance, which can significantly reduce the computational overhead during training and prediction. This study offers a powerful methodology for multi-task prediction in complex networks.

IROct 8, 2015
A vertex similarity index for better personalized recommendation

Ling-Jiao Chen, Zi-Ke Zhang, Jin-Hu Liu et al.

Recommender systems benefit us in tackling the problem of information overload by predicting our potential choices among diverse niche objects. So far, a variety of personalized recommendation algorithms have been proposed and most of them are based on similarities, such as collaborative filtering and mass diffusion. Here, we propose a novel vertex similarity index named CosRA, which combines advantages of both the cosine index and the resource-allocation (RA) index. By applying the CosRA index to real recommender systems including MovieLens, Netflix and RYM, we show that the CosRA-based method has better performance in accuracy, diversity and novelty than some benchmark methods. Moreover, the CosRA index is free of parameters, which is a significant advantage in real applications. Further experiments show that the introduction of two turnable parameters cannot remarkably improve the overall performance of the CosRA index.

IRApr 19, 2014
Promoting cold-start items in recommender systems

Jin-Hu Liu, Tao Zhou, Zi-Ke Zhang et al.

As one of major challenges, cold-start problem plagues nearly all recommender systems. In particular, new items will be overlooked, impeding the development of new products online. Given limited resources, how to utilize the knowledge of recommender systems and design efficient marketing strategy for new items is extremely important. In this paper, we convert this ticklish issue into a clear mathematical problem based on a bipartite network representation. Under the most widely used algorithm in real e-commerce recommender systems, so-called the item-based collaborative filtering, we show that to simply push new items to active users is not a good strategy. To our surprise, experiments on real recommender systems indicate that to connect new items with some less active users will statistically yield better performance, namely these new items will have more chance to appear in other users' recommendation lists. Further analysis suggests that the disassortative nature of recommender systems contributes to such observation. In a word, getting in-depth understanding on recommender systems could pave the way for the owners to popularize their cold-start products with low costs.

IRApr 7, 2014
Multi-Linear Interactive Matrix Factorization

Lu Yu, Chuang Liu, Zi-Ke Zhang

Recommender systems, which can significantly help users find their interested items from the information era, has attracted an increasing attention from both the scientific and application society. One of the widest applied recommendation methods is the Matrix Factorization (MF). However, most of MF based approaches focus on the user-item rating matrix, but ignoring the ingredients which may have significant influence on users' preferences on items. In this paper, we propose a multi-linear interactive MF algorithm (MLIMF) to model the interactions between the users and each event associated with their final decisions. Our model considers not only the user-item rating information but also the pairwise interactions based on some empirically supported factors. In addition, we compared the proposed model with three typical other methods: user-based collaborative filtering (UCF), item-based collaborative filtering (ICF) and regularized MF (RMF). Experimental results on two real-world datasets, \emph{MovieLens} 1M and \emph{MovieLens} 100k, show that our method performs much better than other three methods in the accuracy of recommendation. This work may shed some light on the in-depth understanding of modeling user online behaviors and the consequent decisions.

IRSep 3, 2013
Information Filtering via Collaborative User Clustering Modeling

Chu-Xu Zhang, Zi-Ke Zhang, Lu Yu et al.

The past few years have witnessed the great success of recommender systems, which can significantly help users find out personalized items for them from the information era. One of the most widely applied recommendation methods is the Matrix Factorization (MF). However, most of researches on this topic have focused on mining the direct relationships between users and items. In this paper, we optimize the standard MF by integrating the user clustering regularization term. Our model considers not only the user-item rating information, but also takes into account the user interest. We compared the proposed model with three typical other methods: User-Mean (UM), Item-Mean (IM) and standard MF. Experimental results on a real-world dataset, \emph{MovieLens}, show that our method performs much better than other three methods in the accuracy of recommendation.

SOC-PHJun 18, 2013
Gravity Effects on Information Filtering and Network Evolving

Jin-Hu Liu, Zi-Ke Zhang, Chengcheng Yang et al.

In this paper, based on the gravity principle of classical physics, we propose a tunable gravity-based model, which considers tag usage pattern to weigh both the mass and distance of network nodes. We then apply this model in solving the problems of information filtering and network evolving. Experimental results on two real-world data sets, \emph{Del.icio.us} and \emph{MovieLens}, show that it can not only enhance the algorithmic performance, but can also better characterize the properties of real networks. This work may shed some light on the in-depth understanding of the effect of gravity model.

SOC-PHMay 31, 2013
Heterogeneity Involved Network-based Algorithm Leads to Accurate and Personalized Recommendations

Tian Qiu, Tian-Tian Wang, Zi-Ke Zhang et al.

Heterogeneity of both the source and target objects is taken into account in a network-based algorithm for the directional resource transformation between objects. Based on a biased heat conduction recommendation method (BHC) which considers the heterogeneity of the target object, we propose a heterogeneous heat conduction algorithm (HHC), by further taking the source object degree as the weight of diffusion. Tested on three real datasets, the Netflix, RYM and MovieLens, the HHC algorithm is found to present a better recommendation in both the accuracy and personalization than two excellent algorithms, i.e., the original BHC and a hybrid algorithm of heat conduction and mass diffusion (HHM), while not requiring any other accessorial information or parameter. Moreover, the HHC even elevates the recommendation accuracy on cold objects, referring to the so-called cold start problem, for effectively relieving the recommendation bias on objects with different level of popularity.

DATA-ANSep 1, 2012
Anchoring Bias in Online Voting

Zimo Yang, Zi-Ke Zhang, Tao Zhou

Voting online with explicit ratings could largely reflect people's preferences and objects' qualities, but ratings are always irrational, because they may be affected by many unpredictable factors like mood, weather, as well as other people's votes. By analyzing two real systems, this paper reveals a systematic bias embedding in the individual decision-making processes, namely people tend to give a low rating after a low rating, as well as a high rating following a high rating. This so-called \emph{anchoring bias} is validated via extensive comparisons with null models, and numerically speaking, the extent of bias decays with interval voting number in a logarithmic form. Our findings could be applied in the design of recommender systems and considered as important complementary materials to previous knowledge about anchoring effects on financial trades, performance judgements, auctions, and so on.

IRJun 14, 2012
A two-step Recommendation Algorithm via Iterative Local Least Squares

Jinhu Liu, Chengcheng Yang, Zi-Ke Zhang

Recommender systems can change our life a lot and help us select suitable and favorite items much more conveniently and easily. As a consequence, various kinds of algorithms have been proposed in last few years to improve the performance. However, all of them face one critical problem: data sparsity. In this paper, we proposed a two-step recommendation algorithm via iterative local least squares (ILLS). Firstly, we obtain the ratings matrix which is constructed via users' behavioral records, and it is normally very sparse. Secondly, we preprocess the "ratings" matrix through ProbS which can convert the sparse data to a dense one. Then we use ILLS to estimate those missing values. Finally, the recommendation list is generated. Experimental results on the three datasets: MovieLens, Netflix, RYM, suggest that the proposed method can enhance the algorithmic accuracy of AUC. Especially, it performs much better in dense datasets. Furthermore, since this methods can improve those missing value more accurately via iteration which might show light in discovering those inactive users' purchasing intention and eventually solving cold-start problem.

IRMay 13, 2012
Promotional effect on cold start problem and diversity in a data characteristic based recommendation method

Tian Qiu, Zi-Ke Zhang, Guang Chen

Pure methods generally perform excellently in either recommendation accuracy or diversity, whereas hybrid methods generally outperform pure cases in both recommendation accuracy and diversity, but encounter the dilemma of optimal hybridization parameter selection for different recommendation focuses. In this article, based on a user-item bipartite network, we propose a data characteristic based algorithm, by relating the hybridization parameter to the data characteristic. Different from previous hybrid methods, the present algorithm adaptively assign the optimal parameter specifically for each individual items according to the correlation between the algorithm and the item degrees. Compared with a highly accurate pure method, and a hybrid method which is outstanding in both the recommendation accuracy and the diversity, our method shows a remarkably promotional effect on the long-standing challenging problem of the cold start, as well as the recommendation diversity, while simultaneously keeps a high overall recommendation accuracy. Even compared with an improved hybrid method which is highly efficient on the cold start problem, the proposed method not only further improves the recommendation accuracy of the cold items, but also enhances the recommendation diversity. Our work might provide a promising way to better solving the personal recommendation from the perspective of relating algorithms with dataset properties.

IRApr 9, 2012
Social Recommender Systems Based on Coupling Network Structure Analysis

Xiao Hu, Chuibo Chen, Xiaolong Chen et al.

The past few years has witnessed the great success of recommender systems, which can significantly help users find relevant and interesting items for them in the information era. However, a vast class of researches in this area mainly focus on predicting missing links in bipartite user-item networks (represented as behavioral networks). Comparatively, the social impact, especially the network structure based properties, is relatively lack of study. In this paper, we firstly obtain five corresponding network-based features, including user activity, average neighbors' degree, clustering coefficient, assortative coefficient and discrimination, from social and behavioral networks, respectively. A hybrid algorithm is proposed to integrate those features from two respective networks. Subsequently, we employ a machine learning process to use those features to provide recommendation results in a binary classifier method. Experimental results on a real dataset, Flixster, suggest that the proposed method can significantly enhance the algorithmic accuracy. In addition, as network-based properties consider not only the social activities, but also take into account user preferences in the behavioral networks, therefore, it performs much better than that from either social or behavioral networks. Furthermore, since the features based on the behavioral network contain more diverse and meaningfully structural information, they play a vital role in uncovering users' potential preference, which, might show light in deeply understanding the structure and function of the social and behavioral networks.

IRFeb 27, 2012
Tag-Aware Recommender Systems: A State-of-the-art Survey

Zi-Ke Zhang, Tao Zhou, Yi-Cheng Zhang

In the past decade, Social Tagging Systems have attracted increasing attention from both physical and computer science communities. Besides the underlying structure and dynamics of tagging systems, many efforts have been addressed to unify tagging information to reveal user behaviors and preferences, extract the latent semantic relations among items, make recommendations, and so on. Specifically, this article summarizes recent progress about tag-aware recommender systems, emphasizing on the contributions from three mainstream perspectives and approaches: network-based methods, tensor-based methods, and the topic-based methods. Finally, we outline some other tag-related works and future challenges of tag-aware recommendation algorithms.

DATA-ANFeb 14, 2012
Scaling Laws in Human Language

Linyuan Lu, Zi-Ke Zhang, Tao Zhou

Zipf's law on word frequency is observed in English, French, Spanish, Italian, and so on, yet it does not hold for Chinese, Japanese or Korean characters. A model for writing process is proposed to explain the above difference, which takes into account the effects of finite vocabulary size. Experiments, simulations and analytical solution agree well with each other. The results show that the frequency distribution follows a power law with exponent being equal to 1, at which the corresponding Zipf's exponent diverges. Actually, the distribution obeys exponential form in the Zipf's plot. Deviating from the Heaps' law, the number of distinct words grows with the text length in three stages: It grows linearly in the beginning, then turns to a logarithmical form, and eventually saturates. This work refines previous understanding about Zipf's law and Heaps' law in language systems.

SOC-PHFeb 6, 2012
Recommender Systems

Linyuan Lü, Matus Medo, Chi Ho Yeung et al.

The ongoing rapid expansion of the Internet greatly increases the necessity of effective recommender systems for filtering the abundant information. Extensive research for recommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and practical achievements, unification and comparison of different approaches are lacking, which impedes further advances. In this article, we review recent developments in recommender systems and discuss the major challenges. We compare and evaluate available algorithms and examine their roles in the future developments. In addition to algorithms, physical aspects are described to illustrate macroscopic behavior of recommender systems. Potential impacts and future directions are discussed. We emphasize that recommendation has a great scientific depth and combines diverse research fields which makes it of interests for physicists as well as interdisciplinary researchers.