NEOct 14, 2021

Analysis of the first Genetic Engineering Attribution Challenge

arXiv:2110.11242v110 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of attributing engineered biological sequences to their designers, which is crucial for accountability and credit in biotechnology, representing a competitive benchmark advancement.

The paper presents results from the first Genetic Engineering Attribution Challenge, where top teams significantly improved accuracy in identifying the lab-of-origin for engineered biological sequences, with a 10 percentage point increase in top-1 and top-10 accuracy, and an ensemble model further boosted performance.

The ability to identify the designer of engineered biological sequences -- termed genetic engineering attribution (GEA) -- would help ensure due credit for biotechnological innovation, while holding designers accountable to the communities they affect. Here, we present the results of the first Genetic Engineering Attribution Challenge, a public data-science competition to advance GEA. Top-scoring teams dramatically outperformed previous models at identifying the true lab-of-origin of engineered sequences, including an increase in top-1 and top-10 accuracy of 10 percentage points. A simple ensemble of prizewinning models further increased performance. New metrics, designed to assess a model's ability to confidently exclude candidate labs, also showed major improvements, especially for the ensemble. Most winning teams adopted CNN-based machine-learning approaches; however, one team achieved very high accuracy with an extremely fast neural-network-free approach. Future work, including future competitions, should further explore a wide diversity of approaches for bringing GEA technology into practical use.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes