Solon P. Pissis

DS
3papers
99citations
Novelty50%
AI Score44

3 Papers

NEJul 3, 2022
Symbolic Regression is NP-hard

Marco Virgolin, Solon P. Pissis

Symbolic regression (SR) is the task of learning a model of data in the form of a mathematical expression. By their nature, SR models have the potential to be accurate and human-interpretable at the same time. Unfortunately, finding such models, i.e., performing SR, appears to be a computationally intensive task. Historically, SR has been tackled with heuristics such as greedy or genetic algorithms and, while some works have hinted at the possible hardness of SR, no proof has yet been given that SR is, in fact, NP-hard. This begs the question: Is there an exact polynomial-time algorithm to compute SR models? We provide evidence suggesting that the answer is probably negative by showing that SR is NP-hard.

73.8DSMay 6
Faster Algorithms for Shortest Unique or Absent Substrings

Panagiotis Charalampopoulos, Manal Mohamed, Solon P. Pissis et al.

We revisit two well-known algorithmic problems on strings: computing a shortest unique substring (SUS) and a shortest absent substring (SAS) of a string $S$ of length $n$. Both problems admit folklore $\mathcal{O}(n)$-time solutions using the suffix tree of $S$. However, for small alphabets, this complexity is not necessarily optimal in the word RAM model, where a string of length $n$ over alphabet $[0,σ)$ can be stored in $\mathcal{O}(n \log σ/\log n)$ space and read in $\mathcal{O}(n \log σ/\log n)$ time. We present an $\mathcal{O}(n \log σ/\sqrt{\log n})$-time algorithm for computing a SUS of $S$. This algorithm decomposes the problem according to the length and the period of the sought substring and uses several tools and techniques, such as synchronizing sets, the analysis of runs, and wavelet trees, to reduce the computation of a SUS to a simple geometric problem. Further, we adapt this algorithm and combine it with an efficient construction of de Bruijn sequences in order to obtain an $\mathcal{O}(n \log σ/\sqrt{\log n})$-time algorithm for computing a SAS of $S$.

69.5DSMar 13
Optimal Enumeration of Eulerian Trails in Directed Graphs

Ben Bals, Solon P. Pissis, Matei Tinca

The BEST theorem, due to de Bruijn, van Aardenne-Ehrenfest, Smith, and Tutte, is a classical tool from graph theory that links the Eulerian trails in a directed graph $G=(V,E)$ with the arborescences in $G$. In particular, one can use the BEST theorem to count the Eulerian trails in $G$ in polynomial time. For enumerating the Eulerian trails in $G$, one could naturally resort to first enumerating the arborescences in $G$ and then exploiting the insight of the BEST theorem to enumerate the Eulerian trails in $G$: every arborescence in $G$ corresponds to at least one Eulerian trail in $G$. Instead, we take a simple and direct approach. Our central contribution is a remarkably simple algorithm to directly enumerate the $z_T$ Eulerian trails in $G$ in the \emph{optimal} $O(m + z_T)$ time. As a consequence, our result improves on an implementation of the BEST theorem for counting Eulerian trails in $G$ when $z_T=o(n^2)$, and, in addition, it unconditionally improves the combinatorial $O(m\cdot z_T)$-time algorithm of Conte et al. [FCT 2021] for the same task. Moreover, we show that, with some care, our algorithm can be extended to enumerate Eulerian trails in directed multigraphs in optimal time, enabling applications in bioinformatics and data privacy.