LGAIMay 15, 2024

An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding

arXiv:2405.09585v34 citationsh-index: 8IJCAI
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of improving genomic selection for crop breeders by enhancing Transformer robustness, though it appears incremental as it builds on existing deep learning trends with simple modifications.

The paper tackled the challenge of applying Transformers to genomic selection in crop breeding, where datasets have long sequences and limited samples, by proposing a simple framework with k-mer tokenization and random masking, achieving superior performance on rice3k and wheat3k datasets compared to seminal methods.

Genomic selection (GS), as a critical crop breeding strategy, plays a key role in enhancing food production and addressing the global hunger crisis. The predominant approaches in GS currently revolve around employing statistical methods for prediction. However, statistical methods often come with two main limitations: strong statistical priors and linear assumptions. A recent trend is to capture the non-linear relationships between markers by deep learning. However, as crop datasets are commonly long sequences with limited samples, the robustness of deep learning models, especially Transformers, remains a challenge. In this work, to unleash the unexplored potential of attention mechanism for the task of interest, we propose a simple yet effective Transformer-based framework that enables end-to-end training of the whole sequence. Via experiments on rice3k and wheat3k datasets, we show that, with simple tricks such as k-mer tokenization and random masking, Transformer can achieve overall superior performance against seminal methods on GS tasks of interest.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes