ITETITJun 5

The Synthesis-Sequencing Channel for DNA-based Data Storage

arXiv:2606.0721610.2
Originality Incremental advance
AI Analysis

This work provides a more realistic model for DNA storage systems, enabling better understanding of trade-offs between coverage and errors for reliable data storage.

The paper introduces the synthesis-sequencing channel, a two-stage model for DNA-based data storage that captures both synthesis and sequencing effects, and establishes its information-theoretic capacity by deriving matching converse and achievability bounds for binary symmetric channels with different error probabilities.

We introduce and study the synthesis-sequencing channel, a two-stage model for DNA-based data storage that jointly captures synthesis and sequencing effects. The synthesis-sequencing channel provides a more nuanced and realistic model of the DNA storage process compared to prior work, as it distinguishes between physical coverage after synthesis and sequencing coverage after readout, relaxes the assumption of independent errors across reads, and naturally induces coverage bias through the composition of synthesis and sequencing stages. We establish the information-theoretic capacity of this channel by deriving matching converse and achievability bounds for the case where synthesis and sequencing errors are modeled by binary symmetric channels with possibly different error probabilities, under mild assumptions on the channel parameters. Our results reveal multiple trade-offs between physical coverage, synthesis errors, sequencing coverage, and sequencing errors that influence the maximum achievable rate for reliable data storage.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes