David Bader

2papers

2 Papers

LGJan 13
Continuous Fairness On Data Streams

Subhodeep Ghosh, Zhihui Du, Angela Bonifati et al.

We study the problem of enforcing continuous group fairness over windows in data streams. We propose a novel fairness model that ensures group fairness at a finer granularity level (referred to as block) within each sliding window. This formulation is particularly useful when the window size is large, making it desirable to enforce fairness at a finer granularity. Within this framework, we address two key challenges: efficiently monitoring whether each sliding window satisfies block-level group fairness, and reordering the current window as effectively as possible when fairness is violated. To enable real-time monitoring, we design sketch-based data structures that maintain attribute distributions with minimal overhead. We also develop optimal, efficient algorithms for the reordering task, supported by rigorous theoretical guarantees. Our evaluation on four real-world streaming scenarios demonstrates the practical effectiveness of our approach. We achieve millisecond-level processing and a throughput of approximately 30,000 queries per second on average, depending on system parameters. The stream reordering algorithm improves block-level group fairness by up to 95% in certain cases, and by 50-60% on average across datasets. A qualitative study further highlights the advantages of block-level fairness compared to window-level fairness.

DBSep 6, 2013
A Brief Study of Open Source Graph Databases

Rob McColl, David Ediger, Jason Poovey et al.

With the proliferation of large irregular sparse relational datasets, new storage and analysis platforms have arisen to fill gaps in performance and capability left by conventional approaches built on traditional database technologies and query languages. Many of these platforms apply graph structures and analysis techniques to enable users to ingest, update, query and compute on the topological structure of these relationships represented as set(s) of edges between set(s) of vertices. To store and process Facebook-scale datasets, they must be able to support data sources with billions of edges, update rates of millions of updates per second, and complex analysis kernels. These platforms must provide intuitive interfaces that enable graph experts and novice programmers to write implementations of common graph algorithms. In this paper, we explore a variety of graph analysis and storage platforms. We compare their capabil- ities, interfaces, and performance by implementing and computing a set of real-world graph algorithms on synthetic graphs with up to 256 million edges. In the spirit of full disclosure, several authors are affiliated with the development of STINGER.