Qixuan Sun

7.5LGJun 29

Preference-based Antibody Expression Ranking: Scaling with Large-scale Weak Supervision

Josh Qixuan Sun, Morteza Babaie, Wenyang Hou et al.

Antibody expression ranking is a critical task in antibody design, yet its modelling is severely hindered by the scarcity of labeled experimental data. To address this, we propose a unified preference-based learning framework that integrates scarce quantitative expression data with large-scale weak positive supervision from immunization data. We adapt Direct Preference Optimization (DPO) to protein language models by introducing a union-masked log-likelihood approximation and IMGT-based alignment, enabling efficient training on variable-length sequences. Evaluating on a diverse internal dataset of 1254 labeled sequences and 4 million unlabeled camelid-derived antibodies, we show that our method consistently outperforms baselines on most metrics. Our results demonstrate that preference learning can effectively learn from weak supervision, providing a scalable solution for antibody expressibility optimization in data-constrained settings. Project page: https://kisoji-biotechnology-inc.github.io/Preference-Expression-Ranking/.

1.4CLApr 15, 2021Code

A Dual-Questioning Attention Network for Emotion-Cause Pair Extraction with Context Awareness

Qixuan Sun, Yaqi Yin, Hong Yu

Emotion-cause pair extraction (ECPE), an emerging task in sentiment analysis, aims at extracting pairs of emotions and their corresponding causes in documents. This is a more challenging problem than emotion cause extraction (ECE), since it requires no emotion signals which are demonstrated as an important role in the ECE task. Existing work follows a two-stage pipeline which identifies emotions and causes at the first step and pairs them at the second step. However, error propagation across steps and pair combining without contextual information limits the effectiveness. Therefore, we propose a Dual-Questioning Attention Network to alleviate these limitations. Specifically, we question candidate emotions and causes to the context independently through attention networks for a contextual and semantical answer. Also, we explore how weighted loss functions in controlling error propagation between steps. Empirical results show that our method performs better than baselines in terms of multiple evaluation metrics. The source code can be obtained at https://github.com/QixuanSun/DQAN.

Qixuan Sun

2 Papers