CYJul 25, 2020Code
Corona-Warn-App: Tracing the Start of the Official COVID-19 Exposure Notification App for GermanyJens Helge Reelfs, Oliver Hohlfeld, Ingmar Poese
On June 16, 2020, Germany launched an open-source smartphone contact tracing app ("Corona-Warn-App") to help tracing SARS-CoV-2 (coronavirus) infection chains. It uses a decentralized, privacy-preserving design based on the Exposure Notification APIs in which a centralized server is only used to distribute a list of keys of SARS-CoV-2 infected users that is fetched by the app once per day. Its success, however, depends on its adoption. In this poster, we characterize the early adoption of the app using Netflow traces captured directly at its hosting infrastructure. We show that the app generated traffic from allover Germany---already on the first day. We further observe that local COVID-19 outbreaks do not result in noticeable traffic increases.
SIMar 1, 2021
Understanding & Predicting User Lifetime with Machine Learning in an Anonymous Location-Based Social NetworkJens Helge Reelfs, Max Bergmann, Oliver Hohlfeld et al.
In this work, we predict the user lifetime within the anonymous and location-based social network Jodel in the Kingdom of Saudi Arabia. Jodel's location-based nature yields to the establishment of disjoint communities country-wide and enables for the first time the study of user lifetime in the case of a large set of disjoint communities. A user's lifetime is an important measurement for evaluating and steering customer bases as it can be leveraged to predict churn and possibly apply suitable methods to circumvent potential user losses. We train and test off the shelf machine learning techniques with 5-fold crossvalidation to predict user lifetime as a regression and classification problem; identifying the Random Forest to provide very strong results. Discussing model complexity and quality trade-offs, we also dive deep into a time-dependent feature subset analysis, which does not work very well; Easing up the classification problem into a binary decision (lifetime longer than timespan $x$) enables a practical lifetime predictor with very good performance. We identify implicit similarities across community models according to strong correlations in feature importance. A single countrywide model generalizes the problem and works equally well for any tested community; the overall model internally works similar to others also indicated by its feature importances.
CLMay 19, 2020
Word-Emoji Embeddings from large scale Messaging Data reflect real-world Semantic Associations of Expressive IconsJens Helge Reelfs, Oliver Hohlfeld, Markus Strohmaier et al.
We train word-emoji embeddings on large scale messaging data obtained from the Jodel online social network. Our data set contains more than 40 million sentences, of which 11 million sentences are annotated with a subset of the Unicode 13.0 standard Emoji list. We explore semantic emoji associations contained in this embedding by analyzing associations between emojis, between emojis and text, and between text and emojis. Our investigations demonstrate anecdotally that word-emoji embeddings trained on large scale messaging data can reflect real-world semantic associations. To enable further research we release the Jodel Emoji Embedding Dataset (JEED1488) containing 1488 emojis and their embeddings along 300 dimensions.