Tinson Xu

LG
h-index4
3papers
1citation
Novelty62%
AI Score49

3 Papers

LGMay 2Code
NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training

Fang Wu, Haokai Zhao, Da Xing et al.

Diffusion models have achieved remarkable success across a wide range of generative tasks, yet their training paradigm largely treats injected noise as uniformly informative. In this work, we challenge this assumption and introduce NoiseRater, a meta-learning framework for instance-level noise valuation in diffusion model training. We propose a parametric noise rater that assigns importance scores to individual noise realizations conditioned on data and timestep, enabling adaptive reweighting of the training objective. The rater is trained via bilevel optimization to improve downstream validation performance after inner-loop diffusion updates. To enable efficient deployment, we further design a decoupled two-stage pipeline that transitions from soft weighting during meta-training to hard noise selection during standard training. Extensive experiments on FFHQ and ImageNet demonstrate that not all noise samples contribute equally, and that prioritizing informative noise improves both training efficiency and generation quality. Our results establish noise valuation as a complementary and previously underexplored axis for improving diffusion model training. Our code is available at: https://anonymous.4open.science/r/NoiseRater-DEB116.

LGMay 1
Proteo-R1: Reasoning Foundation Models for De Novo Protein Design

Fang Wu, Weihao Xuan, Heli Qi et al.

Deep learning in \emph{de novo} protein design has achieved atomic-level fidelity. However, existing models remain largely non-deliberative: they directly synthesize molecular geometries without explicitly reasoning about which residues or interactions are functionally essential. As a result, design decisions are entangled with continuous sampling dynamics, limiting interpretability, controllability, and systematic reuse of biochemical knowledge. We introduce \textbf{Proteo-R1}, a reasoning-guided protein design framework that explicitly decouples \emph{molecular understanding} from \emph{geometric generation}. Proteo-R1 adopts a dual-expert architecture in which a multimodal large language model (MLLM) serves as an \emph{understanding expert}, analyzing protein sequences, structures, and textual context to identify key functional residues that govern binding and specificity. These residue-level decisions are then passed as hard constraints to a separate diffusion-based \emph{generation expert}, which performs conditional co-design while respecting the fixed interaction anchors. This factorization mirrors how human experts approach molecular engineering: first, reasoning about critical interactions, then optimizing geometry subject to those constraints. By operationalizing reasoning as explicit residue-level commitments rather than latent textual guidance, Proteo-R1 achieves stable, interpretable, and modular integration of LLM reasoning with state-of-the-art geometric generative models. Code, data, and demos are available at https://smiles724.github.io/r1/.

LGSep 14, 2025
FragmentGPT: A Unified GPT Model for Fragment Growing, Linking, and Merging in Molecular Design

Xuefeng Liu, Songhao Jiang, Qinan Huang et al.

Fragment-Based Drug Discovery (FBDD) is a popular approach in early drug development, but designing effective linkers to combine disconnected molecular fragments into chemically and pharmacologically viable candidates remains challenging. Further complexity arises when fragments contain structural redundancies, like duplicate rings, which cannot be addressed by simply adding or removing atoms or bonds. To address these challenges in a unified framework, we introduce FragmentGPT, which integrates two core components: (1) a novel chemically-aware, energy-based bond cleavage pre-training strategy that equips the GPT-based model with fragment growing, linking, and merging capabilities, and (2) a novel Reward Ranked Alignment with Expert Exploration (RAE) algorithm that combines expert imitation learning for diversity enhancement, data selection and augmentation for Pareto and composite score optimality, and Supervised Fine-Tuning (SFT) to align the learner policy with multi-objective goals. Conditioned on fragment pairs, FragmentGPT generates linkers that connect diverse molecular subunits while simultaneously optimizing for multiple pharmaceutical goals. It also learns to resolve structural redundancies-such as duplicated fragments-through intelligent merging, enabling the synthesis of optimized molecules. FragmentGPT facilitates controlled, goal-driven molecular assembly. Experiments and ablation studies on real-world cancer datasets demonstrate its ability to generate chemically valid, high-quality molecules tailored for downstream drug discovery tasks.