DC LGOct 9, 2025

Adaptive Execution Scheduler for DataDios SmartDiff

arXiv:2510.07811v1h-index: 1

Originality Incremental advance

AI Analysis

This is an incremental improvement for optimizing data processing systems, specifically for differencing engines in tabular data benchmarks.

The paper tackles the problem of minimizing latency and memory usage in a differencing engine by introducing an adaptive scheduler that tunes batch size and worker count, resulting in a 23-28% reduction in p95 latency and 16-22% lower peak memory compared to a tuned baseline.

We present an adaptive scheduler for a single differencing engine (SmartDiff) with two execution modes: (i) in-memory threads and (ii) Dask based parallelism. The scheduler continuously tunes batch size and worker/thread count within fixed CPU and memory budgets to minimize p95 latency. A lightweight preflight profiler estimates bytes/row and I/O rate; an online cost/memory model prunes unsafe actions; and a guarded hill-climb policy favors lower latency with backpressure and straggler mitigation. Backend selection is gated by a conservative working-set estimate so that in-memory execution is chosen when safe, otherwise Dask is used. Across synthetic and public tabular benchmarks, the scheduler reduces p95 latency by 23 to 28 percent versus a tuned warm-up heuristic (and by 35 to 40 percent versus fixed grid baselines), while lowering peak memory by 16 to 22 percent (25 to 32 percent vs. fixed) with zero OOMs and comparable throughput.

View on arXiv PDF

Similar