67.3SIMar 30
Embeddings of Nation-Level Social NetworksTanzir Pial, Flavio Hafner, Dakota Handzlik et al.
Full nation-scale social networks are now emerging from countries such as the Netherlands and Denmark, but these networks present challenging technical issues in working with large, multiplex, time-dependent networks. We report on our experiences in producing dynamic node embeddings of the population network of the Netherlands. We present (a) a layer-sensitive random walk strategy which improves on traditional flattening methods for multiplex networks, (b) a temporal alignment strategy that brings annual networks into the same embedding space, without leaking information to future years, and (c) the use of Fibonacci spirals and embedding whitening techniques for more balanced and effective partitioning. We demonstrate the effectiveness of these techniques in building embedding-based models for 13 downstream tasks.
LGFeb 1, 2024
Combining the Strengths of Dutch Survey and Register Data in a Data Challenge to Predict Fertility (PreFer)Elizaveta Sivak, Paulina Pankowska, Adrienne Mendrik et al.
The social sciences have produced an impressive body of research on determinants of fertility outcomes, or whether and when people have children. However, the strength of these determinants and underlying theories are rarely evaluated on their predictive ability on new data. This prevents us from systematically comparing studies, hindering the evaluation and accumulation of knowledge. In this paper, we present two datasets which can be used to study the predictability of fertility outcomes in the Netherlands. One dataset is based on the LISS panel, a longitudinal survey which includes thousands of variables on a wide range of topics, including individual preferences and values. The other is based on the Dutch register data which lacks attitudinal data but includes detailed information about the life courses of millions of Dutch residents. We provide information about the datasets and the samples, and describe the fertility outcome of interest. We also introduce the fertility prediction data challenge PreFer which is based on these datasets and will start in Spring 2024. We outline the ways in which measuring the predictability of fertility outcomes using these datasets and combining their strengths in the data challenge can advance our understanding of fertility behaviour and computational social science. We further provide details for participants on how to take part in the data challenge.
CRMar 26, 2021
Secure Platform for Processing Sensitive Data on Shared HPC SystemsMichel Scheerman, Narges Zarrabi, Martijn Kruiten et al.
High performance computing clusters operating in shared and batch mode pose challenges for processing sensitive data. In the meantime, the need for secure processing of sensitive data on HPC system is growing. In this work we present a novel method for creating secure computing environments on traditional multi-tenant high-performance computing clusters. Our platform as a service provides a customizable, virtualized solution using PCOCC and SLURM to meet strict security requirements without modifying the exist-ing HPC infrastructure. We show how this platform has been used in real-world research applications from different research domains. The solution is scalable by design with low performance overhead and can be generalized for processing sensitive data on shared HPC systems imposing high security criteria