DB LG SIMay 20, 2019

Ingesting High-Velocity Streaming Graphs from Social Media Sources

Subhasis Dasgupta, Aditya Bagchi, Amarnath Gupta

arXiv:1905.08337v11.2

Originality Synthesis-oriented

AI Analysis

This addresses a domain-specific challenge for data scientists and engineers working with social network analysis by providing incremental improvements to data ingestion processes.

The paper tackled the problem of efficiently ingesting high-velocity, bursty streaming graph data from social media into graph databases, and the result was an adaptive buffering mechanism and graph compression technique that improved ingestion efficiency, as verified through experiments.

Many data science applications like social network analysis use graphs as their primary form of data. However, acquiring graph-structured data from social media presents some interesting challenges. The first challenge is the high data velocity and bursty nature of the social media data. The second challenge is that the complex nature of the data makes the ingestion process expensive. If we want to store the streaming graph data in a graph database, we face a third challenge -- the database is very often unable to sustain the ingestion of high-velocity, high-burst data. We have developed an adaptive buffering mechanism and a graph compression technique that effectively mitigates the problem. A novel aspect of our method is that the adaptive buffering algorithm uses the data rate, the data content as well as the CPU resources of the database machine to determine an optimal data ingestion mechanism. We further show that an ingestion-time graph-compression strategy improves the efficiency of the data ingestion into the database. We have verified the efficacy of our ingestion optimization strategy through extensive experiments.

View on arXiv PDF

Similar