Loklak - A Distributed Crawler and Data Harvester for Overcoming Rate Limits
This addresses the need for researchers and data scientists to access social media data despite platform restrictions, though it appears incremental as it builds on existing crawler concepts.
The paper tackles the problem of collecting data from social networks like Twitter and Weibo, which impose rate limits, by introducing Loklak, a distributed crawler that enables continuous data harvesting to provide an open data repository.
Modern social networks have become sources for vast quantities of data. Having access to such big data can be very useful for various researchers and data scientists. In this paper we describe Loklak, an open source distributed peer to peer crawler and scraper for supporting such research on platforms like Twitter, Weibo and other social networks. Social networks such as Twitter and Weibo pose various limitations to the user on the rate at which one could freely collect such data for research. Our crawler enables researchers to continuously collect data while overcoming the barriers of authentication and rate limits imposed to provide a repository of open data as a service.