96.9CEMay 22
GeoCycler: Reward-Aligned 3D Diffusion for Constraint-Conditioned Cyclic Peptide DesignJingjie Zhang, Hanqun Cao, Haosen Shi et al.
Cyclic peptides are attractive therapeutic modalities because their closed-ring topology can improve stability and target specificity. However, de novo cyclic peptide design remains challenging for diffusion generators, as macrocyclization requires satisfying sparse, non-smooth, and compositional geometric constraints. Existing constraint-conditioned methods largely rely on inference-time guidance, which can steer samples toward desired closures but does not directly change the learned generative distribution. We propose GeoCycler, a reward-weighted diffusion alignment framework for training conditional latent diffusion models toward macrocyclization feasibility. GeoCycler introduces a type-gated stair reward that activates distance-based shaping only when prerequisite residue or linker types are satisfied, providing dense geometric feedback while avoiding misleading signals from chemically incompatible anchors. Together with positive-only reward weighting and replay-based stabilization, GeoCycler aligns a single generator across multiple cyclization topologies. On the LNR benchmark, GeoCycler improves pass@5 closure success over strong guidance-based baselines across stapled, head-to-tail, disulfide, and bicyclic settings. In particular, it improves head-to-tail success by 20.8 percentage points over CP-Composer while maintaining comparable amino-acid and backbone-dihedral statistics. These results suggest that training-time alignment to sparse geometric constraints is a promising alternative to relying solely on post hoc sampling-time correction for cyclic peptide generation.
CVJan 27
Pareto-Guided Optimization for Uncertainty-Aware Medical Image SegmentationJinming Zhang, Xi Yang, Youpeng Yang et al.
Uncertainty in medical image segmentation is inherently non-uniform, with boundary regions exhibiting substantially higher ambiguity than interior areas. Conventional training treats all pixels equally, leading to unstable optimization during early epochs when predictions are unreliable. We argue that this instability hinders convergence toward Pareto-optimal solutions and propose a region-wise curriculum strategy that prioritizes learning from certain regions and gradually incorporates uncertain ones, reducing gradient variance. Methodologically, we introduce a Pareto-consistent loss that balances trade-offs between regional uncertainties by adaptively reshaping the loss landscape and constraining convergence dynamics between interior and boundary regions; this guides the model toward Pareto-approximate solutions. To address boundary ambiguity, we further develop a fuzzy labeling mechanism that maintains binary confidence in non-boundary areas while enabling smooth transitions near boundaries, stabilizing gradients, and expanding flat regions in the loss surface. Experiments on brain metastasis and non-metastatic tumor segmentation show consistent improvements across multiple configurations, with our method outperforming traditional crisp-set approaches in all tumor subregions.
LGAug 25, 2023
Learning Compact Neural Networks with Deep Overparameterised Multitask LearningShen Ren, Haosen Shi
Compact neural network offers many benefits for real-world applications. However, it is usually challenging to train the compact neural networks with small parameter sizes and low computational costs to achieve the same or better model performance compared to more complex and powerful architecture. This is particularly true for multitask learning, with different tasks competing for resources. We present a simple, efficient and effective multitask learning overparameterisation neural network design by overparameterising the model architecture in training and sharing the overparameterised model parameters more effectively across tasks, for better optimisation and generalisation. Experiments on two challenging multitask datasets (NYUv2 and COCO) demonstrate the effectiveness of the proposed method across various convolutional networks and parameter sizes.
LGJun 1, 2021
An Entropy Regularization Free Mechanism for Policy-based Reinforcement LearningChangnan Xiao, Haosen Shi, Jiajun Fan et al.
Policy-based reinforcement learning methods suffer from the policy collapse problem. We find valued-based reinforcement learning methods with ε-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem. However, there does not exist a parallel mechanism for policy-based methods that achieves all three characteristics. In this paper, we propose an entropy regularization free mechanism that is designed for policy-based methods, which achieves Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off. Our experiments show that our mechanism is super sample-efficient for policy-based methods and boosts a policy-based baseline to a new State-Of-The-Art on Arcade Learning Environment.
LGMay 9, 2021
CASA: Bridging the Gap between Policy Improvement and Policy Evaluation with Conflict Averse Policy IterationChangnan Xiao, Haosen Shi, Jiajun Fan et al.
We study the problem of model-free reinforcement learning, which is often solved following the principle of Generalized Policy Iteration (GPI). While GPI is typically an interplay between policy evaluation and policy improvement, most conventional model-free methods assume the independence of the granularity and other details of the GPI steps, despite of the inherent connections between them. In this paper, we present a method that regularizes the inconsistency between policy evaluation and policy improvement, leading to a conflict averse GPI solution with reduced functional approximation error. To this end, we formulate a novel learning paradigm where taking the policy evaluation step is equivalent to some compensation of performing policy improvement, and thus effectively alleviates the gradient conflict between the two GPI steps. We also show that the form of our proposed solution is equivalent to performing entropy-regularized policy improvement and therefore prevents the policy from being trapped into suboptimal solutions. We conduct extensive experiments to evaluate our method on the Arcade Learning Environment (ALE). Empirical results show that our method outperforms several strong baselines in major evaluation domains.