3 Papers

LGDec 25, 2025
VAMP-Net: An Interpretable Multi-Path Framework of Genomic Permutation-Invariant Set Attention and Quality-Aware 1D-CNN for MTB Drug Resistance

Aicha Boutorh, Kamar Hibatallah Baghdadi, Anais Daoud

Genomic prediction of drug resistance in Mycobacterium tuberculosis remains challenging due to complex epistatic interactions and highly variable sequencing data quality. We present a novel Interpretable Variant-Aware Multi-Path Network (VAMP-Net) that addresses both challenges through complementary machine learning pathways. Path-1 employs a Set Attention Transformer processing permutation-invariant variant sets to capture epistatic interactions between genomic loci. Path-2 utilizes a 1D Convolutional Neural Network that analyzes Variant Call Format quality metrics to learn adaptive confidence scores. A fusion module combines both pathways for final resistance classification. We conduct comparative evaluations of unmasked versus padding-masked Set Attention Blocks, and demonstrate that our multi-path architecture achieves superior performance over baseline CNN and MLP models, with accuracy exceeding 95% and AUC around 97% for Rifampicin (RIF) and Rifabutin (RFB) resistance prediction. The framework provides dual-layer interpretability: Attention Weight Analysis reveals Epistatic networks, and Integrated Gradients (IG) was applied for critical resistance loci (notably rpoB), while gradient-based feature importance from the CNN pathway uncovers drug-specific dependencies on data quality metrics. This architecture advances clinical genomics by delivering state-of-the-art predictive performance alongside auditable interpretability at two distinct levels, genetic causality of mutation sets and technical confidence of sequencing evidence, establishing a new paradigm for robust, clinically-actionable resistance prediction.

LGDec 26, 2025
DuaDeep-SeqAffinity: Dual-Stream Deep Learning Framework for Sequence-Only Antigen-Antibody Affinity Prediction

Aicha Boutorh, Soumia Bouyahiaoui, Sara Belhadj et al.

Predicting the binding affinity between antigens and antibodies is fundamental to drug discovery and vaccine development. Traditional computational approaches often rely on experimentally determined 3D structures, which are scarce and computationally expensive to obtain. This paper introduces DuaDeep-SeqAffinity, a novel sequence-only deep learning framework that predicts affinity scores solely from their amino acid sequences using a dual-stream hybrid architecture. Our approach leverages pre-trained ESM-2 protein language model embeddings, combining 1D Convolutional Neural Networks (CNNs) for local motif detection with Transformer encoders for global contextual representation. A subsequent fusion module integrates these multi-faceted features, which are then passed to a fully connected network for final score regression. Experimental results demonstrate that DuaDeep-SeqAffinity significantly outperforms individual architectural components and existing state-of-the-art (SOTA) methods. DuaDeep achieved a superior Pearson correlation of 0.688, an R^2 of 0.460, and a Root Mean Square Error (RMSE) of 0.737, surpassing single-branch variants ESM-CNN and ESM-Transformer. Notably, the model achieved an Area Under the Curve (AUC) of 0.890, outperforming sequence-only benchmarks and even surpassing structure-sequence hybrid models. These findings prove that high-fidelity sequence embeddings can capture essential binding patterns typically reserved for structural modeling. By eliminating the reliance on 3D structures, DuaDeep-SeqAffinity provides a highly scalable and efficient solution for high-throughput screening of vast sequence libraries, significantly accelerating the therapeutic discovery pipeline.

CVMar 7
AgrI Challenge: A Data-Centric AI Competition for Cross-Team Validation in Agricultural Vision

Mohammed Brahimi, Karim Laabassi, Mohamed Seghir Hadj Ameur et al.

Machine learning models in agricultural vision often achieve high accuracy on curated datasets but fail to generalize under real field conditions due to distribution shifts between training and deployment environments. Moreover, most machine learning competitions focus primarily on model design while treating datasets as fixed resources, leaving the role of data collection practices in model generalization largely unexplored. We introduce the AgrI Challenge, a data-centric competition framework in which multiple teams independently collect field datasets, producing a heterogeneous multi-source benchmark that reflects realistic variability in acquisition conditions. To systematically evaluate cross-domain generalization across independently collected datasets, we propose Cross-Team Validation (CTV), an evaluation paradigm that treats each team's dataset as a distinct domain. CTV includes two complementary protocols: Train-on-One-Team-Only (TOTO), which measures single-source generalization, and Leave-One-Team-Out (LOTO), which evaluates collaborative multi-source training. Experiments reveal substantial generalization gaps under single-source training: models achieve near-perfect validation accuracy yet exhibit validation-test gaps of up to 16.20% (DenseNet121) and 11.37% (Swin Transformer) when evaluated on datasets collected by other teams. In contrast, collaborative multi-source training dramatically improves robustness, reducing the gap to 2.82% and 1.78%, respectively. The challenge also produced a publicly available dataset of 50,673 field images of six tree species collected by twelve independent teams, providing a diverse benchmark for studying domain shift and data-centric learning in agricultural vision.