Laurent Simon

CR
h-index26
9papers
99citations
Novelty43%
AI Score37

9 Papers

LGMay 28, 2025Code
Machine Learning Models Have a Supply Chain Problem

Sarah Meiklejohn, Hayden Blauzvern, Mihai Maruseac et al. · deepmind

Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant supply-chain risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.

CRAug 21, 2025
PickleBall: Secure Deserialization of Pickle-based Machine Learning Models (Extended Report)

Andreas D. Kellas, Neophytos Christou, Wenxin Jiang et al.

Machine learning model repositories such as the Hugging Face Model Hub facilitate model exchanges. However, bad actors can deliver malware through compromised models. Existing defenses such as safer model formats, restrictive (but inflexible) loading policies, and model scanners have shortcomings: 44.9% of popular models on Hugging Face still use the insecure pickle format, 15% of these cannot be loaded by restrictive loading policies, and model scanners have both false positives and false negatives. Pickle remains the de facto standard for model exchange, and the ML community lacks a tool that offers transparent safe loading. We present PickleBall to help machine learning engineers load pickle-based models safely. PickleBall statically analyzes the source code of a given machine learning library and computes a custom policy that specifies a safe load-time behavior for benign models. PickleBall then dynamically enforces the policy during load time as a drop-in replacement for the pickle module. PickleBall generates policies that correctly load 79.8% of benign pickle-based models in our dataset, while rejecting all (100%) malicious examples in our dataset. In comparison, evaluated model scanners fail to identify known malicious models, and the state-of-art loader loads 22% fewer benign models than PickleBall. PickleBall removes the threat of arbitrary function invocation from malicious pickle-based models, raising the bar for attackers to depend on code reuse techniques.

SDFeb 10, 2025
Evaluation of Deep Audio Representations for Hearables

Fabian Gröger, Pascal Baumann, Ludovic Amruthalingam et al.

Effectively steering hearable devices requires understanding the acoustic environment around the user. In the computational analysis of sound scenes, foundation models have emerged as the state of the art to produce high-performance, robust, multi-purpose audio representations. We introduce and release Deep Evaluation of Audio Representations (DEAR), the first dataset and benchmark to evaluate the efficacy of foundation models in capturing essential acoustic properties for hearables. The dataset includes 1,158 audio tracks, each 30 seconds long, created by spatially mixing proprietary monologues with commercial, high-quality recordings of everyday acoustic scenes. Our benchmark encompasses eight tasks that assess the general context, speech sources, and technical acoustic properties of the audio scenes. Through our evaluation of four general-purpose audio representation models, we demonstrate that the BEATs model significantly surpasses its counterparts. This superiority underscores the advantage of models trained on diverse audio collections, confirming their applicability to a wide array of auditory tasks, including encoding the environment properties necessary for hearable steering. The DEAR dataset and associated code are available at https://dear-dataset.github.io.

DCMay 14, 2024
Drift Detection: Introducing Gaussian Split Detector

Maxime Fuccellaro, Laurent Simon, Akka Zemmari

Recent research yielded a wide array of drift detectors. However, in order to achieve remarkable performance, the true class labels must be available during the drift detection phase. This paper targets at detecting drift when the ground truth is unknown during the detection phase. To that end, we introduce Gaussian Split Detector (GSD) a novel drift detector that works in batch mode. GSD is designed to work when the data follow a normal distribution and makes use of Gaussian mixture models to monitor changes in the decision boundary. The algorithm is designed to handle multi-dimension data streams and to work without the ground truth labels during the inference phase making it pertinent for real world use. In an extensive experimental study on real and synthetic datasets, we evaluate our detector against the state of the art. We show that our detector outperforms the state of the art in detecting real drift and in ignoring virtual drift which is key to avoid false alarms.

AIJun 2, 2020
SAT Heritage: a community-driven effort for archiving, building and running more than thousand SAT solvers

Gilles Audemard, Loïc Paulevé, Laurent Simon

SAT research has a long history of source code and binary releases, thanks to competitions organized every year. However, since every cycle of competitions has its own set of rules and an adhoc way of publishing source code and binaries, compiling or even running any solver may be harder than what it seems. Moreover, there has been more than a thousand solvers published so far, some of them released in the early 90's. If the SAT community wants to archive and be able to keep track of all the solvers that made its history, it urgently needs to deploy an important effort. We propose to initiate a community-driven effort to archive and to allow easy compilation and running of all SAT solvers that have been released so far. We rely on the best tools for archiving and building binaries (thanks to Docker, GitHub and Zenodo) and provide a consistent and easy way for this. Thanks to our tool, building (or running) a solver from its source (or from its binary) can be done in one line.

CRMar 26, 2019
Hearing your touch: A new acoustic side channel on smartphones

Ilia Shumailov, Laurent Simon, Jeff Yan et al.

We present the first acoustic side-channel attack that recovers what users type on the virtual keyboard of their touch-screen smartphone or tablet. When a user taps the screen with a finger, the tap generates a sound wave that propagates on the screen surface and in the air. We found the device's microphone(s) can recover this wave and "hear" the finger's touch, and the wave's distortions are characteristic of the tap's location on the screen. Hence, by recording audio through the built-in microphone(s), a malicious app can infer text as the user enters it on their device. We evaluate the effectiveness of the attack with 45 participants in a real-world environment on an Android tablet and an Android smartphone. For the tablet, we recover 61% of 200 4-digit PIN-codes within 20 attempts, even if the model is not trained with the victim's data. For the smartphone, we recover 9 words of size 7--13 letters with 50 attempts in a common side-channel attack benchmark. Our results suggest that it not always sufficient to rely on isolation mechanisms such as TrustZone to protect user input. We propose and discuss hardware, operating-system and application-level mechanisms to block this attack more effectively. Mobile devices may need a richer capability model, a more user-friendly notification system for sensor usage and a more thorough evaluation of the information leaked by the underlying hardware.

AIJun 10, 2016
Community Structure in Industrial SAT Instances

Carlos Ansótegui, Maria Luisa Bonet, Jesús Giráldez-Cru et al.

Modern SAT solvers have experienced a remarkable progress on solving industrial instances. Most of the techniques have been developed after an intensive experimental process. It is believed that these techniques exploit the underlying structure of industrial instances. However, there are few works trying to exactly characterize the main features of this structure. The research community on complex networks has developed techniques of analysis and algorithms to study real-world graphs that can be used by the SAT community. Recently, there have been some attempts to analyze the structure of industrial SAT instances in terms of complex networks, with the aim of explaining the success of SAT solving techniques, and possibly improving them. In this paper, inspired by the results on complex networks, we study the community structure, or modularity, of industrial SAT instances. In a graph with clear community structure, or high modularity, we can find a partition of its nodes into communities such that most edges connect variables of the same community. In our analysis, we represent SAT instances as graphs, and we show that most application benchmarks are characterized by a high modularity. On the contrary, random SAT instances are closer to the classical Erdös-Rényi random graph model, where no structure can be observed. We also analyze how this structure evolves by the effects of the execution of a CDCL SAT solver. In particular, we use the community structure to detect that new clauses learned by the solver during the search contribute to destroy the original structure of the formula. This is, learned clauses tend to contain variables of distinct communities.

CRDec 23, 2014
Systemization of Pluggable Transports for Censorship Resistance

Sheharbano Khattak, Laurent Simon, Steven J. Murdoch

An increasing number of countries implement Internet censorship at different scales and for a variety of reasons. In particular, the link between the censored client and entry point to the uncensored network is a frequent target of censorship due to the ease with which a nation-state censor can control it. A number of censorship resistance systems have been developed thus far to help circumvent blocking on this link, which we refer to as link circumvention systems (LCs). The variety and profusion of attack vectors available to a censor has led to an arms race, leading to a dramatic speed of evolution of LCs. Despite their inherent complexity and the breadth of work in this area, there is no systematic way to evaluate link circumvention systems and compare them against each other. In this paper, we (i) sketch an attack model to comprehensively explore a censor's capabilities, (ii) present an abstract model of a LC, a system that helps a censored client communicate with a server over the Internet while resisting censorship, (iii) describe an evaluation stack that underscores a layered approach to evaluate LCs, and (iv) systemize and evaluate existing censorship resistance systems that provide link circumvention. We highlight open challenges in the evaluation and development of LCs and discuss possible mitigations.

MMJul 6, 2014
Sonic interaction with a virtual orchestra of factory machinery

Laurent Simon, Florian Nouviale, Ronan Gaugne et al.

This paper presents an immersive application where users receive sound and visual feedbacks on their interactions with a virtual environment. In this application, the users play the part of conductors of an orchestra of factory machines since each of their actions on interaction devices triggers a pair of visual and audio responses. Audio stimuli were spatialized around the listener. The application was exhibited during the 2013 Science and Music day and designed to be used in a large immersive system with head tracking, shutter glasses and a 10.2 loudspeaker configuration.