IT ITMay 17

Concatenated Codes for Short-Molecule DNA Storage with Sequencing Channels of Positive Zero-Undetected-Error Capacity

Ran Tamir, Nir Weinberger, Albert Guillén i Fàbregas

arXiv:2602.1280038.7h-index: 11

AI Analysis

This work provides theoretical bounds for DNA storage systems, which is important for advancing practical DNA-based data storage, but the results are incremental as they extend known coding theory concepts to a specific channel model.

The paper studies reliable information storage in DNA-based systems with noisy sequencing, using a concatenated coding scheme. It derives an achievability bound for the scaling of information bits and proves that the average error probability of random linear block codes under zero-undetected-error decoding converges to zero exponentially fast for rates below a critical value.

We study the amount of reliable information that can be stored in a DNA-based storage system with noisy sequencing, where each codeword is composed of short DNA molecules. We analyze a concatenated coding scheme, where the outer code is designed to handle the random sampling, while the inner code is designed to handle the random sequencing noise. We assume that the sequencing channel is symmetric and choose the inner coding scheme to be composed by a linear block code and a zero-undetected-error decoder. As a byproduct, the resulting optimal maximum-likelihood decoder land itself for an amenable analysis, and we are able to derive an achievability bound for the scaling of the number of information bits that can be reliably stored. As a result of independent interest, we prove that the average error probability of random linear block codes under zero-undetected-error decoding converges to zero exponentially fast with the block length, as long as its coding rate does not exceed some critical value, which is known to serve as a lower bound to the zero-undetected-error capacity.

View on arXiv PDF

Similar