ITAug 31, 2021
Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep LearningDaniella Bar-Lev, Itai Orr, Omer Sabary et al.
DNA-based storage is an emerging technology that enables digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability, and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of the main bottlenecks are the scalability and accuracy, which have a natural tradeoff between the two. Here we show a modular and holistic approach that combines Deep Neural Networks (DNN) trained on simulated data, Tensor-Product (TP) based Error-Correcting Codes (ECC), and a safety margin mechanism into a single coherent pipeline. We demonstrated our solution on 3.1MB of information using two different sequencing technologies. Our work improves upon the current leading solutions by up to x3200 increase in speed, 40% improvement in accuracy, and offers a code rate of 1.6 bits per base in a high noise regime. In a broader sense, our work shows a viable path to commercial DNA storage solutions hindered by current information retrieval processes.
CRDec 25, 2019
Efficient Algorithm for the Linear Complexity of Sequences and Some Related ConsequencesYeow Meng Chee, Johan Chrisnata, Tuvi Etzion et al.
The linear complexity of a sequence $s$ is one of the measures of its predictability. It represents the smallest degree of a linear recursion which the sequence satisfies. There are several algorithms to find the linear complexity of a periodic sequence $s$ of length $N$ (where $N$ is of some given form) over a finite field $F_q$ in $O(N)$ symbol field operations. The first such algorithm is The Games-Chan Algorithm which considers binary sequences of period $2^n$, and is known for its extreme simplicity. We generalize this algorithm and apply it efficiently for several families of binary sequences. Our algorithm is very simple, it requires $βN$ bit operations for a small constant $β$, where $N$ is the period of the sequence. We make an analysis on the number of bit operations required by the algorithm and compare it with previous algorithms. In the process, the algorithm also finds the recursion for the shortest linear feedback shift-register which generates the sequence. Some other interesting properties related to shift-register sequences, which might not be too surprising but generally unnoted, are also consequences of our exposition.
ITSep 22, 2016
PIR schemes with small download complexity and low storage requirementsSimon R. Blackburn, Tuvi Etzion, Maura B. Paterson
In the classical model for (information theoretically secure) Private Information Retrieval (PIR), a user wishes to retrieve one bit of a database that is stored on a set of $n$ servers, in such a way that no individual server gains information about which bit the user is interested in. The aim is to design schemes that minimise communication between the user and the servers. More recently, there have been moves to consider more realistic models where the total storage of the set of servers, or the per server storage, should be minimised (possibly using techniques from distributed storage), and where the database is divided into $R$-bit records with $R>1$, and the user wishes to retrieve one record rather than one bit. When $R$ is large, downloads from the servers to the user dominate the communication complexity and so the aim is to minimise the total number of downloaded bits. Shah, Rashmi and Ramchandran show that at least $R+1$ bits must be downloaded from servers in the worst case, and provide PIR schemes meeting this bound. Sun and Jafar determine the best asymptotic download cost of a PIR scheme (as $R\rightarrow\infty$), where this cost is defined as the ratio of the message length $R$ and the total number of bits downloaded. This paper provides various bounds on the download complexity of a PIR scheme, generalising those of Shah et al. to the case when the number $n$ of servers is bounded, and providing links with classical techniques due to Chor et al. The paper also provides a range of constructions for PIR schemes that are either simpler or perform better than previously known schemes, including explicit schemes that achieve the best asymptotic download complexity of Sun and Jafar with significantly lower upload complexity, and general techniques for constructing a scheme with good worst case download complexity from a scheme with good download complexity on average.