SIJun 2
Evidence-Aware Protein Complex Detection: Methods, Benchmarks, and Reproducibility ChallengesSima Soltani, Mehrdad Jalali, Yahya Forghani et al.
Protein complexes are central units of cellular organization, yet their identification from protein-protein interaction (PPI) networks remains difficult because interactome maps are noisy, incomplete, context dependent, and unevenly annotated. This focused methodological review examines evidence-aware approaches that combine PPI topology with Gene Ontology (GO) annotations, expression profiles, subcellular localization, sequence or domain evidence, temporal information, and representation learning, with emphasis on post-2018 methods and selected historical baselines. The central synthesis is that transparent evidence-aware graph methods currently offer the strongest tradeoff between biological plausibility and reproducibility, while deep, hypergraph, and dynamic heterogeneous models expand biological realism but require stronger benchmark control. The central bottleneck is no longer only the lack of algorithms, but the lack of harmonized, overlap-aware, and reproducible evaluation protocols. We therefore recommend unified benchmark versions, explicit GO-circularity controls, overlap-aware metrics, uncertainty estimates, and executable software packages over isolated source-specific F-measure gains.
SIMay 20
ECHO-PPI: Trustworthy AI for Evidence-Bundled Detection of Overlapping Protein Modules in Protein-Protein Interaction NetworksSima Soltani, Mehrdad Jalali, Yahya Forghani
Protein-protein interaction networks provide a graph-level view of cellular organization, yet their functional modules are overlapping, noisy, and difficult to interpret from cluster assignments alone. Existing community-detection methods can recover candidate protein complexes, but they rarely explain why an individual protein is assigned to a specific module or whether that assignment should be treated as core, peripheral, or uncertain. Here we introduce ECHO-PPI, an evidence-bundled framework for interpretable overlapping protein-module detection in protein-protein interaction networks. ECHO-PPI integrates weighted network topology, semantic protein profiles, and Gene Ontology evidence to identify evidence-potential nuclei, construct candidate modules, perform overlap-aware assignment, and export hierarchical confidence labels. The framework supports trustworthy computational decision support through assignment-level interpretability: each protein-module assignment is accompanied by topology, semantic, and Gene Ontology evidence scores and a hierarchical confidence label, enabling curators to inspect, rank, and triage overlapping module predictions. Evaluation on yeast protein-interaction data shows that ECHO-PPI preserves the behaviour of strong overlap-aware baselines while adding evidence-bundled auditability. Rather than claiming universal predictive superiority, ECHO-PPI addresses a complementary need: making overlapping protein-module predictions inspectable, confidence-aware, and reproducible for downstream biological interpretation.
LGFeb 9, 2019
Distance metric learning based on structural neighborhoods for dimensionality reduction and classification performance improvementMostafa Razavi Ghods, Mohammad Hossein Moattar, Yahya Forghani
Distance metric learning can be viewed as one of the fundamental interests in pattern recognition and machine learning, which plays a pivotal role in the performance of many learning methods. One of the effective methods in learning such a metric is to learn it from a set of labeled training samples. The issue of data imbalance is the most important challenge of recent methods. This research tries not only to preserve the local structures but also covers the issue of imbalanced datasets. To do this, the proposed method first tries to extract a low dimensional manifold from the input data. Then, it learns the local neighborhood structures and the relationship of the data points in the ambient space based on the adjacencies of the same data points on the embedded low dimensional manifold. Using the local neighborhood relationships extracted from the manifold space, the proposed method learns the distance metric in a way which minimizes the distance between similar data and maximizes their distance from the dissimilar data points. The evaluations of the proposed method on numerous datasets from the UCI repository of machine learning, and also the KDDCup98 dataset as the most imbalance dataset, justify the supremacy of the proposed approach in comparison with other approaches especially when the imbalance factor is high.
LGAug 17, 2013
Comment on "robustness and regularization of support vector machines" by H. Xu, et al., (Journal of Machine Learning Research, vol. 10, pp. 1485-1510, 2009, arXiv:0803.3490)Yahya Forghani, Hadi Sadoghi Yazdi
This paper comments on the published work dealing with robustness and regularization of support vector machines (Journal of Machine Learning Research, vol. 10, pp. 1485-1510, 2009) [arXiv:0803.3490] by H. Xu, etc. They proposed a theorem to show that it is possible to relate robustness in the feature space and robustness in the sample space directly. In this paper, we propose a counter example that rejects their theorem.