Ran Tamir

IT
h-index25
3papers
66citations
Novelty33%
AI Score41

3 Papers

ITMay 17
Concatenated Codes for Short-Molecule DNA Storage with Sequencing Channels of Positive Zero-Undetected-Error Capacity

Ran Tamir, Nir Weinberger, Albert Guillén i Fàbregas

We study the amount of reliable information that can be stored in a DNA-based storage system with noisy sequencing, where each codeword is composed of short DNA molecules. We analyze a concatenated coding scheme, where the outer code is designed to handle the random sampling, while the inner code is designed to handle the random sequencing noise. We assume that the sequencing channel is symmetric and choose the inner coding scheme to be composed by a linear block code and a zero-undetected-error decoder. As a byproduct, the resulting optimal maximum-likelihood decoder land itself for an amenable analysis, and we are able to derive an achievability bound for the scaling of the number of information bits that can be reliably stored. As a result of independent interest, we prove that the average error probability of random linear block codes under zero-undetected-error decoding converges to zero exponentially fast with the block length, as long as its coding rate does not exceed some critical value, which is known to serve as a lower bound to the zero-undetected-error capacity.

ITNov 2, 2022
On Correlation Detection and Alignment Recovery of Gaussian Databases

Ran Tamir

In this work, we propose an efficient two-stage algorithm solving a joint problem of correlation detection and partial alignment recovery between two Gaussian databases. Correlation detection is a hypothesis testing problem; under the null hypothesis, the databases are independent, and under the alternate hypothesis, they are correlated, under an unknown row permutation. We develop bounds on the type-I and type-II error probabilities, and show that the analyzed detector performs better than a recently proposed detector, at least for some specific parameter choices. Since the proposed detector relies on a statistic, which is a sum of dependent indicator random variables, then in order to bound the type-I probability of error, we develop a novel graph-theoretic technique for bounding the $k$-th order moments of such statistics. When the databases are accepted as correlated, the algorithm also recovers some partial alignment between the given databases. We also propose two more algorithms: (i) One more algorithm for partial alignment recovery, whose reliability and computational complexity are both higher than those of the first proposed algorithm. (ii) An algorithm for full alignment recovery, which has a reduced amount of calculations and a not much lower error probability, when compared to the optimal recovery procedure.

CLNov 29, 2024
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Angelika Romanou, Negar Foroutan, Anna Sotnikova et al.

The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other than English. Moreover, current practices in multilingual benchmark construction often translate English resources, ignoring the regional and cultural knowledge of the environments in which multilingual systems would be used. In this work, we construct an evaluation suite of 197,243 QA pairs from local exam sources to measure the capabilities of multilingual LLMs in a variety of regional contexts. Our novel resource, INCLUDE, is a comprehensive knowledge- and reasoning-centric benchmark across 44 written languages that evaluates multilingual LLMs for performance in the actual language environments where they would be deployed.