Tianyu Liu, Wangjie Zheng, Rui Yang et al.
For clinicians diagnosing rare diseases, Hygieia reduces diagnostic delays and workload while improving accuracy, validated with real-world cases from Yale and Duke-NUS.
Genomics, gene expression, sequencing
Tianyu Liu, Wangjie Zheng, Rui Yang et al.
For clinicians diagnosing rare diseases, Hygieia reduces diagnostic delays and workload while improving accuracy, validated with real-world cases from Yale and Duke-NUS.
Nicolas Huynh, Krzysztof Kacprzyk, Ryan Sheridan et al.
For genomic researchers needing interpretable models, DEFT provides a novel method that combines the interpretability of decision trees with the expressivity of deep learning, addressing the bottleneck of tree depth in sequence analysis.
Lei Huang, Chuan Qiu, Kuan-Jui Su et al.
This provides a scalable and robust solution for genotype imputation in genomic studies, addressing ancestry bias and rare-variant accuracy limitations, though it is incremental as it adapts existing transformer methods to this domain.
Junda Ying, Yuxuan Wang, Bowen Yang et al.
For computational biologists studying cellular differentiation and lineage branching, USB provides a rigorous microscopic interpretation of birth-death events, addressing a key limitation of existing continuous optimal transport methods.
Xingzhong Zhao, Ziqian Xie, Islam et al.
For bioinformaticians analyzing large phenotype panels (e.g., from imaging or representation learning), TorchGWAS makes large-scale GWAS screening practical where existing tools are too slow.
Han Zhang, Guo-Hua Yuan, Chaohao Yuan et al.
This work provides a generative cellular world model for in silico simulation of cell states and perturbation responses, addressing the need for virtual cells in biological discovery and perturbation screening.
Yusen Hou, Weicai Long, Haitao Hu et al.
This addresses the need for better tools in microbiology and biotechnology by assessing LLMs' potential for genomic interpretation, though it is incremental as it focuses on benchmarking rather than a new model.
Yuheng Liang, Lucy Chuo, Ahmadreza Argha et al.
This highlights a critical problem for cancer therapy, as accurate pre-treatment prediction is needed to address patient resistance, but the findings are incremental, showing current models lack robustness.
Gongxu Luo, Boyang Sun, Kun Zhang
For computational biologists using bulk gene expression data to infer causal networks, the paper provides theoretical and empirical evidence that such recovery is generally unreliable without strong linearity assumptions.
Daria Ledneva, Denis Kuznetsov
For genomic modeling, this work addresses the limitation of fixed tokenization by enabling adaptive boundaries, showing strong gains on histone tasks and biological interpretability.
Taewon Kim, Jihwan Shin, Hyomin Kim et al.
For researchers using DNA language models, this work addresses the brittleness of fixed tokenization under genomic variation, offering a more robust representation.
Maciej Sypetkowski, Joanna Krawczyk, Łukasz Smoliński et al.
For biologists and computational researchers, OmicsLM bridges the gap between quantitative omics data and natural-language reasoning, enabling more interpretable and flexible analysis of transcriptomic data.
Lei Huang, Hui Shen, Kuan-Jui Su et al.
For researchers using genotype-based expression prediction and TWAS, this method improves prediction accuracy and biological interpretability by leveraging LD structure and functional priors.
Daria Ledneva, Mikhail Nuridinov, Denis Kuznetsov
Provides a standardized evaluation framework for the genomic ML community to enable principled model comparison and selection.
Ziwei Huang, Zeyuan Song, Paola Sebastiani et al.
For researchers analyzing high-dimensional data with limited samples, RSNet offers a versatile tool for statistically reliable and interpretable network inference, though it is an incremental contribution as a software package.
Yi Duan, Zhao Yang, Jiwei Zhu et al.
Provides interpretable and accurate prediction of DNA regulatory activity for biologists studying gene expression, with explicit mechanistic explanations.
Chenglei Yu*, Chuanrui Wang*, Bangyan Liao et al.
For computational biologists studying cellular development, PACE provides a principled method to infer trajectories from asynchronous snapshot data without requiring explicit cell pairing or lineage tracing.
Abhijoy Sarkar, Aarchi Singh Thakur
For researchers modeling cancer evolution, this benchmark establishes a reproducible baseline and identifies the need for serial ctDNA data.
Jianan Zhao, Xixian Liu, Zhihao Zhan et al.
For researchers in genomics and long-context DNA modeling, GeneZip provides an efficient compression method that reduces computational costs and enables larger models, but it is an incremental improvement over existing encoder-based compressors.
Kamila Szewczyk, Sven Rahmann
This addresses the problem of efficient genomic data storage and access for bioinformatics researchers, offering incremental improvements in speed and compression.