AICEQMMar 2

HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

arXiv:2603.01396v11 citationsh-index: 3
Originality Highly original
AI Analysis

This addresses the challenge of dual heterogeneity bottlenecks in single-cell perturbation studies for researchers, enabling scalable automatic virtual cell modeling without dataset-specific engineering.

The paper tackled the problem of automating single-cell perturbation modeling under semantic and distribution shifts by proposing HarmonyCell, which achieved a 95% valid execution rate on heterogeneous datasets and matched or exceeded expert-designed baselines in out-of-distribution evaluations.

Single-cell perturbation studies face dual heterogeneity bottlenecks: (i) semantic heterogeneity--identical biological concepts encoded under incompatible metadata schemas across datasets; and (ii) statistical heterogeneity--distribution shifts from biological variation demanding dataset-specific inductive biases. We propose HarmonyCell, an end-to-end agent framework resolving each challenge through a dedicated mechanism: an LLM-driven Semantic Unifier autonomously maps disparate metadata into a canonical interface without manual intervention; and an adaptive Monte Carlo Tree Search engine operates over a hierarchical action space to synthesize architectures with optimal statistical inductive biases for distribution shifts. Evaluated across diverse perturbation tasks under both semantic and distribution shifts, HarmonyCell achieves a 95% valid execution rate on heterogeneous input datasets (versus 0% for general agents) while matching or even exceeding expert-designed baselines in rigorous out-of-distribution evaluations. This dual-track orchestration enables scalable automatic virtual cell modeling without dataset-specific engineering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes