DSMar 27

Improved Approximation Algorithms and Hardness Results for Shortest Common Superstring with Reverse Complements

arXiv:2603.2617647.7h-index: 1

AI Analysis

This addresses a fundamental computational challenge in bioinformatics for genome assembly, with incremental improvements in approximation and hardness results.

The paper tackles the Shortest Common Superstring with Reverse Complements (SCS-RC) problem in genome assembly by presenting a new approximation algorithm that improves the ratio from 23/8 to 8/3, and establishes NP-hardness to approximate within a factor better than 333/332.

The Shortest Common Superstring (SCS) problem is a fundamental task in sequence analysis. In genome assembly, however, the double-stranded nature of DNA implies that each fragment may occur either in its original orientation or as its reverse complement. This motivates the Shortest Common Superstring with Reverse Complements (SCS-RC) problem, which asks for a shortest string that contains, for each input string, either the string itself or its reverse complement as a substring. The previously best-known approximation ratio for SCS-RC was $\frac{23}{8}$. In this paper, we present a new approximation algorithm achieving an improved ratio of $\frac{8}{3}$. Our approach computes an optimal constrained cycle cover by reducing the problem, via a novel gadget construction, to a maximum-weight perfect matching in a general graph. We also investigate the computational hardness of SCS-RC. While the decision version is known to be NP-complete, no explicit inapproximability results were previously established. We show that the hardness of SCS carries over to SCS-RC through a polynomial-time reduction, implying that it is NP-hard to approximate SCS-RC within a factor better than $\frac{333}{332}$. Notably, this hardness result holds even for the DNA alphabet.

View on arXiv PDF

Similar