Empowering Machines to Think Like Chemists: Unveiling Molecular Structure-Polarity Relationships with Hierarchical Symbolic Regression
This addresses the problem of balancing expressiveness and interpretability in AI models for chemists analyzing molecular polarity, representing an incremental improvement over existing methods.
The paper tackled the challenge of interpretability in predictive models for thin-layer chromatography (TLC) by introducing Unsupervised Hierarchical Symbolic Regression (UHiSR), which automatically distills chemical-intuitive polarity indices and discovers interpretable equations linking molecular structure to chromatographic behavior.
Thin-layer chromatography (TLC) is a crucial technique in molecular polarity analysis. Despite its importance, the interpretability of predictive models for TLC, especially those driven by artificial intelligence, remains a challenge. Current approaches, utilizing either high-dimensional molecular fingerprints or domain-knowledge-driven feature engineering, often face a dilemma between expressiveness and interpretability. To bridge this gap, we introduce Unsupervised Hierarchical Symbolic Regression (UHiSR), combining hierarchical neural networks and symbolic regression. UHiSR automatically distills chemical-intuitive polarity indices, and discovers interpretable equations that link molecular structure to chromatographic behavior.