Aya Laajil

LG
h-index9
3papers
6citations
Novelty65%
AI Score42

3 Papers

LGJan 30
Calibrated Multivariate Distributional Regression with Pre-Rank Regularization

Aya Laajil, Elnura Zhalieva, Naomi Desobry et al.

The goal of probabilistic prediction is to issue predictive distributions that are as informative as possible, subject to being calibrated. Despite substantial progress in the univariate setting, achieving multivariate calibration remains challenging. Recent work has introduced pre-rank functions, scalar projections of multivariate forecasts and observations, as flexible diagnostics for assessing specific aspects of multivariate calibration, but their use has largely been limited to post-hoc evaluation. We propose a regularization-based calibration method that enforces multivariate calibration during training of multivariate distributional regression models using pre-rank functions. We further introduce a novel PCA-based pre-rank that projects predictions onto principal directions of the predictive distribution. Through simulation studies and experiments on 18 real-world multi-output regression datasets, we show that the proposed approach substantially improves multivariate pre-rank calibration without compromising predictive accuracy, and that the PCA pre-rank reveals dependence-structure misspecifications that are not detected by existing pre-ranks.

LGOct 4, 2025
Curriculum-Augmented GFlowNets For mRNA Sequence Generation

Aya Laajil, Abduragim Shtanchaev, Sajan Muhammad et al.

Designing mRNA sequences is a major challenge in developing next-generation therapeutics, since it involves exploring a vast space of possible nucleotide combinations while optimizing sequence properties like stability, translation efficiency, and protein expression. While Generative Flow Networks are promising for this task, their training is hindered by sparse, long-horizon rewards and multi-objective trade-offs. We propose Curriculum-Augmented GFlowNets (CAGFN), which integrate curriculum learning with multi-objective GFlowNets to generate de novo mRNA sequences. CAGFN integrates a length-based curriculum that progressively adapts the maximum sequence length guiding exploration from easier to harder subproblems. We also provide a new mRNA design environment for GFlowNets which, given a target protein sequence and a combination of biological objectives, allows for the training of models that generate plausible mRNA candidates. This provides a biologically motivated setting for applying and advancing GFlowNets in therapeutic sequence design. On different mRNA design tasks, CAGFN improves Pareto performance and biological plausibility, while maintaining diversity. Moreover, CAGFN reaches higher-quality solutions faster than a GFlowNet trained with random sequence sampling (no curriculum), and enables generalization to out-of-distribution sequences.

LGMay 21, 2025
Loss-Guided Auxiliary Agents for Overcoming Mode Collapse in GFlowNets

Idriss Malek, Aya Laajil, Abhijith Sharma et al.

Although Generative Flow Networks (GFlowNets) are designed to capture multiple modes of a reward function, they often suffer from mode collapse in practice, getting trapped in early-discovered modes and requiring prolonged training to find diverse solutions. Existing exploration techniques often rely on heuristic novelty signals. We propose Loss-Guided GFlowNets (LGGFN), a novel approach where an auxiliary GFlowNet's exploration is \textbf{directly driven by the main GFlowNet's training loss}. By prioritizing trajectories where the main model exhibits \textbf{high loss}, LGGFN focuses sampling on poorly understood regions of the state space. This targeted exploration significantly accelerates the discovery of diverse, high-reward samples. Empirically, across \textbf{diverse benchmarks} including grid environments, structured sequence generation, Bayesian structure learning, and biological sequence design, LGGFN consistently \textbf{outperforms} baselines in exploration efficiency and sample diversity. For instance, on a challenging sequence generation task, it discovered over 40 times more unique valid modes while simultaneously reducing the exploration error metric by approximately 99\%.