42.1DBMar 19
Process Faster, Pay Less: Functional Isolation for Stream ProcessingEleni Zapridou, Michael Koepf, Panagiotis Sioulas et al.
Concurrent workloads often extract insights from high-throughput, real-time data streams. Existing stream processing engines isolate each query's resources, ensuring robust performance but incurring high infrastructure costs. In contrast, sharing work reduces the amount of necessary resources but introduces inter-query interference, leading to performance degradation for some queries. We introduce FunShare, a stream-processing system that improves resource efficiency without compromising performance by dynamically grouping queries based on their performance characteristics. FunShare strategically relaxes query interdependencies and minimizes redundant computation while preserving individual query performance. It achieves this by using an adaptive optimization framework that monitors execution metrics, accurately estimates computation overlaps, and reconfigures execution plans on the fly in response to changes in the underlying data streams. Our evaluation demonstrates that FunShare minimizes resource consumption compared to isolated execution while maintaining or improving throughput for all queries.
10.1DBMar 20
Low-Latency Stateful Stream Processing through Timely and Accurate PrefetchingEleni Zapridou, Anastasia Ailamaki
Mission-critical applications often run "forever" and process large data volumes in real time while demanding low latency. To handle the large state of these applications, modern streaming engines rely on key-value stores and store state on local storage or remotely, but accessing such state inflates latency. As today's engines tightly couple the data path with state I/O, a tuple triggers state access only when it reaches a stateful operator, placing I/O on the critical path and stalling the CPU. However, the keys used to access the state are frequently known earlier in the query plan. Building on this insight, we propose Keyed Prefetching, which decouples the data path from state access by extracting future access keys at upstream operators and proactively staging the corresponding state in memory before tuples arrive. This overlaps I/O with ongoing computation and hides the latency of large-state accesses. We pair Keyed Prefetching with Timestamp-Aware Caching, a cache-eviction policy that jointly manages previously accessed and prefetched entries to use memory efficiently. Together, these techniques reduce latency for long-running, real-time queries without sacrificing throughput.