Levent Sarioglu

DS
5papers
Novelty38%
AI Score45

5 Papers

52.4DSJun 3
The Cascade Log: Reference-Stable Windowing over Tiered Append Sequences

Faruk Alpay, Levent Sarioglu

A long-running append-mostly sequence, such as an edit log, event store, or versioned working set, is usually tiered into a bounded hot stratum and colder folded summaries. This saves memory but breaks stable references: a handle minted while a record is hot may later be resolved after the record has moved into a digest, after it has been superseded, or while a fold is in flight. We define the resulting cross-tier anomalies--dangling, stale, corrupt, and snapshot-skewed resolution--and present the Cascade Log, a reference-stable tiered append structure. The structure keeps a single persistent coalescing interval map over handles as the sole authority on each live version; folding a contiguous run replaces many singleton entries by one digest-backed interval node, and immutable roots provide snapshot tokens. Its cost is characterized by the fragmentation $A$, the number of index pieces, namely live handles plus maximal same-digest runs. The index uses $Θ(A)$ space, resolves a point in $O(\log A)$, reports a $k$-handle range in $O(\log A+k)$, and performs $a$ appends and $s$ supersedes in $O((a/B+s)\log A)$ update work for fold block size $B$. Matching lower bounds show that $Ω(A)$ space and $Ω(\log A+k)$ ordered range cost are unavoidable, and an adversary can force $A=Θ(s)$. Thus the index is sublinear on append-dominated histories and grows linearly only under fragmenting edits. A reference implementation and reproducible experiments to $10^6$ records validate the anomaly-freedom and the fragmentation bounds.

41.4DSMay 27
Residual-Entropy Accounting for Routed Atom-Budgeted Learned Indexes

Faruk Alpay, Levent Sarioglu

We study exact predecessor and rank search in a routed, atom-budgeted, certified-repair learned-index architecture. An ordered directory routes each query to a contiguous interval, a counted local predictor returns a certified rank window, and exact repair resolves the remaining uncertainty by comparisons. The result is scoped to this architecture and does not claim guarantees for arbitrary learned-index designs such as unconstrained RMI dispatch, hash routing, neural routing, or exact-payload indexes without additional accounting. The main parameter is conditional residual answer entropy: the entropy of the exact answer after the leaf, predictor output, certificate, and charged pre-repair information are observed. We prove a two-sided accounting theorem showing that this functional gives the query-time scale under the stated architecture and local predictor-atom budget. Directory space, sorted-array storage, and transcript-indexed repair-program space are treated as separate system costs, so the theorem is not a byte-level space lower bound or a compact implementation recipe. We also give a rank-spread specialization in which the radius term log(1 + Delta) is valid only when many residual ranks remain likely after the predictor transcript is known. For counted piecewise-linear segments, we make the profile term non-oracular, derive a shadow-price allocation rule, compute finite-instance RGapM and GapM values on real SOSD and Zenodo samples, and report benchmarks against PGM-index, RadixSpline, and binary search. The benchmarks expose overheads and bottlenecks rather than claiming speed for the shadow prototype.

5.4DCMay 20
Budgeted Dynamic Trace Structures for Token-Efficient Sequential Computation

Faruk Alpay, Levent Sarioglu

Sequential computation increasingly produces long traces containing nested branches, status transitions, textual payloads, and compact summaries of earlier execution. This paper introduces budgeted dynamic trace structures (BDTS), a data-structural framework for maintaining rooted trace graphs and append-only histories under an explicit byte or token budget. BDTS combines status-filtered reachability, cursor pagination, soft-capped recency logs, reference-counted observation keys, delta overlays, bounded cost caches, and summary-plus-suffix compaction. We give formal invariants, asymptotic bounds, and an ancillary Rust implementation with reproducible benchmarks. Across synthetic traces with 10,000-40,000 vertices, the prototype builds graphs in 0.58-2.72 ms, enumerates all descendants in 0.24-1.42 ms, and compacts histories of 350k-2.71M approximate tokens to 1,048-4,120 approximate tokens. Tokenizer and forward measurements with three public model targets reduce 3,359-3,360 trace tokens to 432-433 tokens.

73.4DSApr 20
Coordinatewise Balanced Covering for Linear Gain Graphs, with an Application to Coset-List Min-2-Lin over Powers of Two

Faruk Alpay, Levent Sarioglu

We study a list-constrained extension of modular equation deletion over powers of two, called Coset-List Min-2-Lin$^{\pm}$ over $\mathbb{Z}/2^d\mathbb{Z}$. Each variable is restricted to a dyadic coset $a+2^{\ell}(\mathbb{Z}/2^d\mathbb{Z})$, each binary constraint is of the form $x_u=x_v$, $x_u=-x_v$, or $x_u=2x_v$, and the goal is to delete a minimum number of constraints so that the remaining system is satisfiable. This problem lies between the no-list case and the poorly understood fully conservative list setting. Our main technical result is a coordinatewise balanced covering theorem for linear gain graphs labeled by vectors in $\mathbb{F}_2^r$. Given any balanced subgraph of cost at most $k$, a randomized procedure outputs a vertex set $S$ and an edge set $F$ such that $(G-F)[S]$ is balanced and, with probability $2^{-O(k^2r)}$, every hidden balanced subgraph of cost at most $k$ is contained in $S$ while all incident deletions are captured by $F$. The proof tensors the one-coordinate balanced-covering theorem of Dabrowski, Jonsson, Ordyniak, Osipov, and Wahlström across coordinates, and is combined with a rank-compression theorem replacing the ambient lifted dimension by the intrinsic cycle-label rank $ρ$. We also develop a cycle-space formulation, a cut-space/potential characterization of balancedness, a minimal-dimension statement for equivalent labelings, and an explicit bit-lifting analysis for dyadic coset systems. These yield a randomized one-sided-error algorithm running in \[ 2^{O(k^2ρ+k\log(kρ+2))}\cdot n^{O(1)}+\widetilde{O}(md+ρ^ω), \] and the same framework returns a minimum-weight feasible deletion set among all solutions of size at most $k$.

CCMar 9
Algorithmic Barriers to Detecting and Repairing Structural Overspecification in Adaptive Data-Structure Selection

Faruk Alpay, Levent Sarioglu

We study algorithmic barriers to detecting and repairing a systematic form of structural overspecification in adaptive data-structure selection. An input instance induces an implied workload signature, such as ordering, sparsity, dynamism, locality, or substring structure, and candidate implementations may be preferred because they match that full signature even when the measured workload evidence supports only a strict subset of it. Under a model in which pairwise evaluators favor implementations that realize the implied signature, we show that this preference propagates through both benchmark aggregation and Bradley-Terry-Luce fitting. We then establish two main results. First, determining whether a representation-selection pipeline exhibits structural commitment beyond measured warrant is undecidable on unbounded input domains, by reduction from the halting problem, but decidable by exhaustive enumeration on finite domains. Second, under a conservative repair constraint requiring already evidence-aligned pipelines to remain unchanged, any total computable repair operator admits an overspecified fixed point via Kleene's recursion theorem. These barriers are qualitatively different from classical lower bounds in data-structure design: they do not limit efficiency on finite workloads, but the possibility of uniformly detecting and repairing overspecification across pipeline families.