BMAILGJun 24, 2025

A standard transformer and attention with linear biases for molecular conformer generation

arXiv:2506.19834v1h-index: 23
Originality Incremental advance
AI Analysis

This addresses the need for efficient and scalable conformer generation in drug discovery, offering a potential foundation for new generative models, though it is incremental as it builds on existing transformer and positional encoding techniques.

The paper tackles the problem of generating low-energy molecular conformations by showing that a standard transformer with a specific relative positional encoding (linear attention biases based on graph distances) can outperform larger non-equivariant models, achieving state-of-the-art results on the GEOM-DRUGS benchmark with 25 million parameters compared to 64 million.

Sampling low-energy molecular conformations, spatial arrangements of atoms in a molecule, is a critical task for many different calculations performed in the drug discovery and optimization process. Numerous specialized equivariant networks have been designed to generate molecular conformations from 2D molecular graphs. Recently, non-equivariant transformer models have emerged as a viable alternative due to their capability to scale to improve generalization. However, the concern has been that non-equivariant models require a large model size to compensate the lack of equivariant bias. In this paper, we demonstrate that a well-chosen positional encoding effectively addresses these size limitations. A standard transformer model incorporating relative positional encoding for molecular graphs when scaled to 25 million parameters surpasses the current state-of-the-art non-equivariant base model with 64 million parameters on the GEOM-DRUGS benchmark. We implemented relative positional encoding as a negative attention bias that linearly increases with the shortest path distances between graph nodes at varying slopes for different attention heads, similar to ALiBi, a widely adopted relative positional encoding technique in the NLP domain. This architecture has the potential to serve as a foundation for a novel class of generative models for molecular conformations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes