Arthur Pignet

LG
h-index3
6papers
9citations
Novelty47%
AI Score45

6 Papers

AIMay 26Code
Laguna M.1/XS.2 Technical Report

Julien Abadji, Marah Abdin, Connor Adams et al.

We present Laguna M.1 and Laguna XS.2, two Mixture-of-Experts foundation models built for long-horizon, agentic coding: M.1 has $225.8$B total parameters ($23.4$B activated per token) and XS.2 has $33.4$B total ($3$B activated). Both models were trained from scratch end-to-end inside the same internal system that we refer to as our Model Factory: a tightly-integrated stack of versioned data, training, evaluation, and inference components that turn model development into an industrial process. We describe the principles and design choices of the Model Factory and also detail the end-to-end training process of our models, throughout pre-training data and architecture, post-training stages, evaluation, and quantization. On agentic software engineering and terminal benchmarks (SWE-bench Verified, SWE-bench Multilingual, SWE-Bench Pro, and Terminal-Bench 2.0) M.1 and XS.2 are competitive with state-of-the-art open models in their respective weight classes. Laguna XS.2 weights are released under Apache~2.0 at https://huggingface.co/collections/poolside/laguna-xs2.

LGJun 13, 2023
SRATTA : Sample Re-ATTribution Attack of Secure Aggregation in Federated Learning

Tanguy Marchand, Régis Loeb, Ulysse Marteau-Ferey et al.

We consider a cross-silo federated learning (FL) setting where a machine learning model with a fully connected first layer is trained between different clients and a central server using FedAvg, and where the aggregation step can be performed with secure aggregation (SA). We present SRATTA an attack relying only on aggregated models which, under realistic assumptions, (i) recovers data samples from the different clients, and (ii) groups data samples coming from the same client together. While sample recovery has already been explored in an FL setting, the ability to group samples per client, despite the use of SA, is novel. This poses a significant unforeseen security threat to FL and effectively breaks SA. We show that SRATTA is both theoretically grounded and can be used in practice on realistic models and datasets. We also propose counter-measures, and claim that clients should play an active role to guarantee their privacy during training.

COMP-PHAug 11, 2022
Cross Section Doppler Broadening prediction using Physically Informed Deep Neural Networks

Arthur Pignet, Luiz Leal, Vaibhav Jaiswal

Temperature dependence of the neutron-nucleus interaction is known as the Doppler broadening of the cross-sections. This is a well-known effect due to the thermal motion of the target nuclei that occurs in the neutron-nucleus interaction. The fast computation of such effects is crucial for any nuclear application. Mechanisms have been developed that allow determining the Doppler effects in the cross-section, most of them based on the numerical resolution of the equation known as Solbrig's kernel, which is a cross-section Doppler broadening formalism derived from a free gas atoms distribution hypothesis. This paper explores a novel non-linear approach based on deep learning techniques. Deep neural networks are trained on synthetic and experimental data, serving as an alternative to the cross-section Doppler Broadening (DB). This paper explores the possibility of using physically informed neural networks, where the network is physically regularized to be the solution of a partial derivative equation, inferred from Solbrig's kernel. The learning process is demonstrated by using the fission, capture, and scattering cross sections for $^{235}U$ in the energy range from thermal to 2250 eV.

LGAug 22, 2025Code
OwkinZero: Accelerating Biological Discovery with AI

Nathan Bigaud, Vincent Cabeli, Meltem Gürel et al.

While large language models (LLMs) are rapidly advancing scientific research, they continue to struggle with core biological reasoning tasks essential for translational and biomedical discovery. To address this limitation, we created and curated eight comprehensive benchmark datasets comprising over 300,000 verifiable question-and-answer pairs, each targeting critical challenges in drug discovery including target druggability, modality suitability, and drug perturbation effects. Using this resource, we developed the OwkinZero models by post-training open-source LLMs through a Reinforcement Learning from Verifiable Rewards strategy. Our results demonstrate that specialized 8-32B OwkinZero models substantially outperform larger, state-of-the-art commercial LLMs on these biological benchmarks. Remarkably, we uncover evidence of a key aspect of generalization: specialist models trained on a single task consistently outperform their base models on previously unseen tasks. This generalization effect is further amplified in our comprehensive OwkinZero models, which were trained on a mixture of datasets and achieve even broader cross-task improvements. This study represents a significant step toward addressing the biological reasoning blind spot in current LLMs, demonstrating that targeted reinforcement learning on carefully curated data can unlock generalizable performance in specialized models, thereby accelerating AI-driven biological discovery.

LGOct 30, 2024
Legitimate ground-truth-free metrics for deep uncertainty classification scoring

Arthur Pignet, Chiara Regniez, John Klein

Despite the increasing demand for safer machine learning practices, the use of Uncertainty Quantification (UQ) methods in production remains limited. This limitation is exacerbated by the challenge of validating UQ methods in absence of UQ ground truth. In classification tasks, when only a usual set of test data is at hand, several authors suggested different metrics that can be computed from such test points while assessing the quality of quantified uncertainties. This paper investigates such metrics and proves that they are theoretically well-behaved and actually tied to some uncertainty ground truth which is easily interpretable in terms of model prediction trustworthiness ranking. Equipped with those new results, and given the applicability of those metrics in the usual supervised paradigm, we argue that our contributions will help promoting a broader use of UQ in deep learning.

CVFeb 27, 2025
Robust sensitivity control in digital pathology via tile score distribution matching

Arthur Pignet, John Klein, Genevieve Robin et al.

Deploying digital pathology models across medical centers is challenging due to distribution shifts. Recent advances in domain generalization improve model transferability in terms of aggregated performance measured by the Area Under Curve (AUC). However, clinical regulations often require to control the transferability of other metrics, such as prescribed sensitivity levels. We introduce a novel approach to control the sensitivity of whole slide image (WSI) classification models, based on optimal transport and Multiple Instance Learning (MIL). Validated across multiple cohorts and tasks, our method enables robust sensitivity control with only a handful of calibration samples, providing a practical solution for reliable deployment of computational pathology systems.