AIBMMay 24, 2025

Chemical classification program synthesis using generative artificial intelligence

arXiv:2505.18470v21 citationsh-index: 5Journal of Cheminformatics
Originality Incremental advance
AI Analysis

This work addresses the need for scalable and explainable chemical classification for cheminformatics and bioinformatics, though it is incremental as it builds on existing methods without achieving state-of-the-art performance.

The authors tackled the problem of automating chemical structure classification by using generative AI to write classifier programs for the ChEBI database, resulting in a system (C3PO) that outperforms a naive classifier but falls short of deep learning methods while offering explainability and reduced data dependence.

Accurately classifying chemical structures is essential for cheminformatics and bioinformatics, including tasks such as identifying bioactive compounds of interest, screening molecules for toxicity to humans, finding non-organic compounds with desirable material properties, or organizing large chemical libraries for drug discovery or environmental monitoring. However, manual classification is labor-intensive and difficult to scale to large chemical databases. Existing automated approaches either rely on manually constructed classification rules, or are deep learning methods that lack explainability. This work presents an approach that uses generative artificial intelligence to automatically write chemical classifier programs for classes in the Chemical Entities of Biological Interest (ChEBI) database. These programs can be used for efficient deterministic run-time classification of SMILES structures, with natural language explanations. The programs themselves constitute an explainable computable ontological model of chemical class nomenclature, which we call the ChEBI Chemical Class Program Ontology (C3PO). We validated our approach against the ChEBI database, and compared our results against deep learning models and a naive SMARTS pattern based classifier. C3PO outperforms the naive classifier, but does not reach the performance of state of the art deep learning methods. However, C3PO has a number of strengths that complement deep learning methods, including explainability and reduced data dependence. C3PO can be used alongside deep learning classifiers to provide an explanation of the classification, where both methods agree. The programs can be used as part of the ontology development process, and iteratively refined by expert human curators.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes