CVDec 14, 2017
Adaptive kNN using Expected Accuracy for Classification of Geo-Spatial DataMark Kibanov, Martin Becker, Juergen Mueller et al.
The k-Nearest Neighbor (kNN) classification approach is conceptually simple - yet widely applied since it often performs well in practical applications. However, using a global constant k does not always provide an optimal solution, e.g., for datasets with an irregular density distribution of data points. This paper proposes an adaptive kNN classifier where k is chosen dynamically for each instance (point) to be classified, such that the expected accuracy of classification is maximized. We define the expected accuracy as the accuracy of a set of structurally similar observations. An arbitrary similarity function can be used to find these observations. We introduce and evaluate different similarity functions. For the evaluation, we use five different classification tasks based on geo-spatial data. Each classification task consists of (tens of) thousands of items. We demonstrate, that the presented expected accuracy measures can be a good estimator for kNN performance, and the proposed adaptive kNN classifier outperforms common kNN and previously introduced adaptive kNN algorithms. Also, we show that the range of considered k can be significantly reduced to speed up the algorithm without negative influence on classification accuracy.
IROct 27, 2017
Combining Aspects of Genetic Algorithms with Weighted Recommender HybridizationJuergen Mueller
Recommender systems are established means to inspire users to watch interesting movies, discover baby names, or read books. The recommendation quality further improves by combining the results of multiple recommendation algorithms using hybridization methods. In this paper, we focus on the task of combining unscored recommendations into a single ensemble. Our proposed method is inspired by genetic algorithms. It repeatedly selects items from the recommendations to create a population of items that will be used for the final ensemble. We compare our method with a weighted voting method and test the performance of both in a movie- and name-recommendation scenario. We were able to outperform the weighted method on both datasets by 20.3 % and 31.1 % and decreased the overall execution time by up to 19.9 %. Our results do not only propose a new kind of hybridization method, but introduce the field of recommender hybridization to further work with genetic algorithms.
IRMay 9, 2017
Predicting Rising Follower Counts on Twitter Using Profile InformationJuergen Mueller, Gerd Stumme
When evaluating the cause of one's popularity on Twitter, one thing is considered to be the main driver: Many tweets. There is debate about the kind of tweet one should publish, but little beyond tweets. Of particular interest is the information provided by each Twitter user's profile page. One of the features are the given names on those profiles. Studies on psychology and economics identified correlations of the first name to, e.g., one's school marks or chances of getting a job interview in the US. Therefore, we are interested in the influence of those profile information on the follower count. We addressed this question by analyzing the profiles of about 6 Million Twitter users. All profiles are separated into three groups: Users that have a first name, English words, or neither of both in their name field. The assumption is that names and words influence the discoverability of a user and subsequently his/her follower count. We propose a classifier that labels users who will increase their follower count within a month by applying different models based on the user's group. The classifiers are evaluated with the area under the receiver operator curve score and achieves a score above 0.800.
CLJun 17, 2016
Gender Inference using Statistical Name Characteristics in TwitterJuergen Mueller, Gerd Stumme
Much attention has been given to the task of gender inference of Twitter users. Although names are strong gender indicators, the names of Twitter users are rarely used as a feature; probably due to the high number of ill-formed names, which cannot be found in any name dictionary. Instead of relying solely on a name database, we propose a novel name classifier. Our approach extracts characteristics from the user names and uses those in order to assign the names to a gender. This enables us to classify international first names as well as ill-formed names.