BMCLIRLGMLMar 3, 2024

When SMILES have Language: Drug Classification using Text Classification Methods on Drug SMILES Strings

arXiv:2403.12984v21 citationsh-index: 8Has CodeTiny Papers @ ICLR
Originality Synthesis-oriented
AI Analysis

This work addresses drug classification for researchers by showing that simpler text-based approaches can be effective, though it is incremental in nature.

The authors tackled drug classification by treating SMILES strings as text and applying basic NLP methods, achieving competitive scores without complex representations.

Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds. These SMILES strings are used in different complex machine learning-based drug-related research and representation works. Escaping from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classification for drug classification? Our experiments affirm the possibility with very competitive scores. The study explores the notion of viewing each atom and bond as sentence components, employing basic NLP methods to categorize drug types, proving that complex problems can also be solved with simpler perspectives. The data and code are available here: https://github.com/azminewasi/Drug-Classification-NLP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes