Surag Nair

37.1LGAug 15, 2024Code

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

Xiner Li, Yulai Zhao, Chenyu Wang et al. · princeton

Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require ``differentiable'' proxy models (\textit{e.g.}, classifier guidance or DPS) or involve computationally expensive fine-tuning of diffusion models (\textit{e.g.}, classifier-free guidance, RL-based fine-tuning). In our work, we propose a new method to address these challenges. Our algorithm is an iterative sampling method that integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future, into the standard inference procedure of pre-trained diffusion models. Notably, our approach avoids fine-tuning generative models and eliminates the need to construct differentiable models. This enables us to (1) directly utilize non-differentiable features/reward feedback, commonly used in many scientific domains, and (2) apply our method to recent discrete diffusion models in a principled way. Finally, we demonstrate the effectiveness of our algorithm across several domains, including image generation, molecule generation, and DNA/RNA sequence generation. The code is available at \href{https://github.com/masa-ue/SVDD}{https://github.com/masa-ue/SVDD}.

4.1LGOct 31, 2018Code

Technical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.5.6.5

Avanti Shrikumar, Katherine Tian, Žiga Avsec et al.

TF-MoDISco (Transcription Factor Motif Discovery from Importance Scores) is an algorithm for identifying motifs from basepair-level importance scores computed on genomic sequence data. This technical note focuses on version v0.5.6.5. The implementation is available at https://github.com/kundajelab/tfmodisco/tree/v0.5.6.5

0.3CLMay 10, 2017

A Biomedical Information Extraction Primer for NLP Researchers

Surag Nair

Biomedical Information Extraction is an exciting field at the crossroads of Natural Language Processing, Biology and Medicine. It encompasses a variety of different tasks that require application of state-of-the-art NLP techniques, such as NER and Relation Extraction. This paper provides an overview of the problems in the field and discusses some of the techniques used for solving them.

Surag Nair

3 Papers