John Hunter

63.6SOC-PHMay 29

SF-LIFE: A Large-Scale Simulated Movement Dataset for the San Francisco Bay Area

Chanuka Algama, Taylor Anderson, Henrique Ferraz de Arruda et al.

We introduce SF-LIFE, a large-scale simulated movement dataset designed to accelerate research in transportation, mobility, and machine learning. The dataset contains 3,024,000,000,000 location records capturing complete, noise-free, multi-modality trajectories of 500,000 simulated agents observed at a 1Hz frequency navigating the San Francisco Bay Area network over a 70-day period. The data captures (1) needs-driven daily agendas of individual agents generated by an agent-based simulation of human patterns of life and (2) detailed kinematic trajectories moving agents across the OpenStreetMap representation of San Francisco using data from 40+ transit agencies across 9 counties. SF-LIFE provides unprecedented scale and detail as trajectories are based on real transit infrastructure using San Francisco General Transit Feed Specification (GTFS) data, having agent movements across multiple modalities, including bus, rail, bike, automobile, and walking. For this high-fidelity simulated representation of San Francisco, we provide (1) the full trajectory data annotated with transportation mode labels, (2) reduced-size versions of the trajectory data with reduced temporal frequency, (3) agent activity information describing the causal activity why an agent visits a place, (4) agent demographic data, and (5) the underlying OSM road network and building data. As the first dataset of its scale and level of detail, SF-LIFE overcomes the privacy, noise, and completeness limitations inherent in real-world tracking data, providing a robust and ethically sourced resource for research in transit optimization, human mobility analysis, and urban computing.

IRAug 19, 2022

Douglas R. Turnbull, Sean McQuillan, Vera Crabtree et al.

Popularity bias is the idea that a recommender system will unduly favor popular artists when recommending artists to users. As such, they may contribute to a winner-take-all marketplace in which a small number of artists receive nearly all of the attention, while similarly meritorious artists are unlikely to be discovered. In this paper, we attempt to measure popularity bias in three state-of-art recommender system models (e.g., SLIM, Multi-VAE, WRMF) and on three commercial music streaming services (Spotify, Amazon Music, YouTube). We find that the most accurate model (SLIM) also has the most popularity bias while less accurate models have less popularity bias. We also find no evidence of popularity bias in the commercial recommendations based on a simulated user experiment.

John Hunter

2 Papers