QM AIApr 17

Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery

arXiv:2604.1433411.6

AI Analysis

For computational biology researchers, this work demonstrates that LLM reasoning can improve predictive performance in feature selection, but reveals a gap between performance and faithfulness, highlighting the need for careful evaluation.

LLM chain-of-thought reasoning filters tissue-composition confounders from gradient saliency gene lists, improving biomarker discovery. The LLM-filtered 17-gene set achieved AUC 0.927 on TCGA-BRCA, surpassing a 5,000-gene variance baseline (AUC 0.903) and raw saliency (AUC 0.832), but only 35.3% of selected genes were validated biomarkers, indicating selective faithfulness.

Gradient saliency from deep sequence models surfaces candidate biomarkers efficiently, but the resulting gene lists can be contaminated by tissue-composition confounders that degrade downstream classifiers. We study whether LLM chain-of-thought (CoT) reasoning can filter these confounders, and whether reasoning quality is associated with downstream performance. We train a Mamba SSM on TCGA-BRCA RNA-seq and extract the top-50 genes by gradient saliency; DeepSeek-R1 evaluates every candidate with structured CoT to produce a final 17-gene set. On the held-out test split, the raw 50-gene saliency set (no LLM) performs worse than a 5,000-gene variance baseline (AUC 0.832 vs. 0.903), while the LLM-filtered set surpasses it (AUC 0.927), using 294x fewer features. A faithfulness audit (COSMIC CGC, OncoKB, PAM50) shows that 6 of 17 selected genes (35.3%) are validated BRCA biomarkers, while 10 of 16 known BRCA genes present in the input were missed - including FOXA1. This divergence between downstream performance and reasoning faithfulness suggests selective faithfulness in this setting: targeted confounder removal can improve predictive performance without comprehensive recall.

View on arXiv PDF

Similar