QMLGAug 11, 2025

Exploring Molecular Odor Taxonomies for Structure-based Odor Predictions using Machine Learning

arXiv:2508.09217v1h-index: 22
Originality Incremental advance
AI Analysis

This work addresses the problem of odor prediction for perfumery and chemistry by providing incremental improvements through taxonomy integration.

The study tackled the challenge of predicting odor from molecular structure by demonstrating that both expert and data-driven odor taxonomies improve machine learning model performance, with the data-driven taxonomy enabling critical evaluation of the expert taxonomy.

One of the key challenges to predict odor from molecular structure is unarguably our limited understanding of the odor space and the complexity of the underlying structure-odor relationships. Here, we show that the predictive performance of machine learning models for structure-based odor predictions can be improved using both, an expert and a data-driven odor taxonomy. The expert taxonomy is based on semantic and perceptual similarities, while the data-driven taxonomy is based on clustering co-occurrence patterns of odor descriptors directly from the prepared dataset. Both taxonomies improve the predictions of different machine learning models and outperform random groupings of descriptors that do not reflect existing relations between odor descriptors. We assess the quality of both taxonomies through their predictive performance across different odor classes and perform an in-depth error analysis highlighting the complexity of odor-structure relationships and identifying potential inconsistencies within the taxonomies by showcasing pear odorants used in perfumery. The data-driven taxonomy allows us to critically evaluate our expert taxonomy and better understand the molecular odor space. Both taxonomies as well as a full dataset are made available to the community, providing a stepping stone for a future community-driven exploration of the molecular basis of smell. In addition, we provide a detailed multi-layer expert taxonomy including a total of 777 different descriptors from the Pyrfume repository.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes