Seongyong Park

2papers

2 Papers

QMDec 19, 2022Code
Anticancer Peptides Classification using Kernel Sparse Representation Classifier

Ehtisham Fazal, Muhammad Sohail Ibrahim, Seongyong Park et al.

Cancer is one of the most challenging diseases because of its complexity, variability, and diversity of causes. It has been one of the major research topics over the past decades, yet it is still poorly understood. To this end, multifaceted therapeutic frameworks are indispensable. \emph{Anticancer peptides} (ACPs) are the most promising treatment option, but their large-scale identification and synthesis require reliable prediction methods, which is still a problem. In this paper, we present an intuitive classification strategy that differs from the traditional \emph{black box} method and is based on the well-known statistical theory of \emph{sparse-representation classification} (SRC). Specifically, we create over-complete dictionary matrices by embedding the \emph{composition of the K-spaced amino acid pairs} (CKSAAP). Unlike the traditional SRC frameworks, we use an efficient \emph{matching pursuit} solver instead of the computationally expensive \emph{basis pursuit} solver in this strategy. Furthermore, the \emph{kernel principal component analysis} (KPCA) is employed to cope with non-linearity and dimension reduction of the feature space whereas the \emph{synthetic minority oversampling technique} (SMOTE) is used to balance the dictionary. The proposed method is evaluated on two benchmark datasets for well-known statistical parameters and is found to outperform the existing methods. The results show the highest sensitivity with the most balanced accuracy, which might be beneficial in understanding structural and chemical aspects and developing new ACPs. The Google-Colab implementation of the proposed method is available at the author's GitHub page (\href{https://github.com/ehtisham-Fazal/ACP-Kernel-SRC}{https://github.com/ehtisham-fazal/ACP-Kernel-SRC}).

BMJun 26, 2020Code
E3-targetPred: Prediction of E3-Target Proteins Using Deep Latent Space Encoding

Seongyong Park, Shujaat Khan, Abdul Wahab

Understanding E3 ligase and target substrate interactions are important for cell biology and therapeutic development. However, experimental identification of E3 target relationships is not an easy task due to the labor-intensive nature of the experiments. In this article, a sequence-based E3-target prediction model is proposed for the first time. The proposed framework utilizes composition of k-spaced amino acid pairs (CKSAAP) to learn the relationship between E3 ligases and their target protein. A class separable latent space encoding scheme is also devised that provides a compressed representation of feature space. A thorough ablation study is performed to identify an optimal gap size for CKSAAP and the number of latent variables that can represent the E3-target relationship successfully. The proposed scheme is evaluated on an independent dataset for a variety of standard quantitative measures. In particular, it achieves an average accuracy of $70.63\%$ on an independent dataset. The source code and datasets used in the study are available at the author's GitHub page (https://github.com/psychemistz/E3targetPred).