Guohui Lin

h-index1
2papers

2 Papers

22.3DSMay 23
Covering vertices by sequential stars

Mengyuan Hu, An Zhang, Yong Chen et al.

We study the problem of covering the maximum number of vertices in a graph by a collection of vertex-disjoint stars, each with a number of satellites in a given interval $[k, \ell]$, where $1 \le k < \ell$ and $\ell$ can be infinity. This is referred to as sequential {\sc $[k, \ell]$-Star Packing} problem. It is solvable in polynomial time when $k = 1$, but becomes strongly NP-hard when $k \ge 2$. In this paper, we propose either the first or an improved approximation algorithm for the following four sequential settings: 1) a $\frac {k+1}2$-approximation algorithm when $k \ge 3$ and $\ell = \infty$, improving the previous best ratio of $\frac {(k+1)^2}{2k+1}$; 2) a $\frac 43$-approximation algorithm when $k = 2$ and $\ell = \infty$, improving the previous best ratio of $\frac 32$; 3) the first $(1 + \frac \ell{\ell+1})$-approximation algorithm when $2 = k < \ell$; and 4) the first $(1 + \max\left\{\frac {k-1}2, \frac {(k+1) \ell}{3 (\ell+1)}\right\})$-approximation algorithm when $3 \le k < \ell$. Besides the main algorithmic techniques being local search coupled with amortized analysis, we observe augmenting configurations to bridge two distant neighborhoods for a local improvement operation. Additionally, the problem has been shown APX-hard when $k \ge 3$; we prove its APX-hardness for the last remaining case where $k = 2$.

QMDec 8, 2024
Pre-trained protein language model for codon optimization

Shashank Pathak, Guohui Lin

Motivation: Codon optimization of Open Reading Frame (ORF) sequences is essential for enhancing mRNA stability and expression in applications like mRNA vaccines, where codon choice can significantly impact protein yield which directly impacts immune strength. In this work, we investigate the use of a pre-trained protein language model (PPLM) for getting a rich representation of amino acids which could be utilized for codon optimization. This leaves us with a simpler fine-tuning task over PPLM in optimizing ORF sequences. Results: The ORFs generated by our proposed models outperformed their natural counterparts encoding the same proteins on computational metrics for stability and expression. They also demonstrated enhanced performance against the benchmark ORFs used in mRNA vaccines for the SARS-CoV-2 viral spike protein and the varicella-zoster virus (VZV). These results highlight the potential of adapting PPLM for designing ORFs tailored to encode target antigens in mRNA vaccines.