Approximate Trace Reconstruction
This work addresses the problem of efficiently reconstructing strings from noisy data for applications in coding theory and bioinformatics, but it is incremental as it builds on existing trace reconstruction frameworks with relaxed accuracy goals.
The paper tackles the approximate trace reconstruction problem, where the goal is to output a string close to the original in edit distance using fewer traces than needed for exact reconstruction, and presents algorithms that achieve edit distance within n/polylog(n) using polylog(n) traces for certain string classes, while also providing a lower bound showing that approximating to within n^(1/3 - δ) edit distance requires n^(1 + 3δ/2)/polylog(n) traces in the worst case.
In the usual trace reconstruction problem, the goal is to exactly reconstruct an unknown string of length $n$ after it passes through a deletion channel many times independently, producing a set of traces (i.e., random subsequences of the string). We consider the relaxed problem of approximate reconstruction. Here, the goal is to output a string that is close to the original one in edit distance while using much fewer traces than is needed for exact reconstruction. We present several algorithms that can approximately reconstruct strings that belong to certain classes, where the estimate is within $n/\mathrm{polylog}(n)$ edit distance, and where we only use $\mathrm{polylog}(n)$ traces (or sometimes just a single trace). These classes contain strings that require a linear number of traces for exact reconstruction and which are quite different from a typical random string. From a technical point of view, our algorithms approximately reconstruct consecutive substrings of the unknown string by aligning dense regions of traces and using a run of a suitable length to approximate each region. To complement our algorithms, we present a general black-box lower bound for approximate reconstruction, building on a lower bound for distinguishing between two candidate input strings in the worst case. In particular, this shows that approximating to within $n^{1/3 - δ}$ edit distance requires $n^{1 + 3δ/2}/\mathrm{polylog}(n)$ traces for $0< δ< 1/3$ in the worst case.