Pierre Senellart

AI
h-index56
6papers
162citations
Novelty45%
AI Score36

6 Papers

LGOct 30, 2025
MSAD: A Deep Dive into Model Selection for Time series Anomaly Detection

Emmanouil Sylligardos, John Paparrizos, Themis Palpanas et al.

Anomaly detection is a fundamental task for time series analytics with important implications for the downstream performance of many applications. Despite increasing academic interest and the large number of methods proposed in the literature, recent benchmarks and evaluation studies demonstrated that no overall best anomaly detection methods exist when applied to very heterogeneous time series datasets. Therefore, the only scalable and viable solution to solve anomaly detection over very different time series collected from diverse domains is to propose a model selection method that will select, based on time series characteristics, the best anomaly detection methods to run. Existing AutoML solutions are, unfortunately, not directly applicable to time series anomaly detection, and no evaluation of time series-based approaches for model selection exists. Towards that direction, this paper studies the performance of time series classification methods used as model selection for anomaly detection. In total, we evaluate 234 model configurations derived from 16 base classifiers across more than 1980 time series, and we propose the first extensive experimental evaluation of time series classification as model selection for anomaly detection. Our results demonstrate that model selection methods outperform every single anomaly detection method while being in the same order of magnitude regarding execution time. This evaluation is the first step to demonstrate the accuracy and efficiency of time series classification algorithms for anomaly detection, and represents a strong baseline that can then be used to guide the model selection step in general AutoML pipelines. Preprint version of an article accepted at the VLDB Journal.

AIJul 18, 2023
Modular Multimodal Machine Learning for Extraction of Theorems and Proofs in Long Scientific Documents (Extended Version)

Shrey Mishra, Antoine Gauquier, Pierre Senellart

We address the extraction of mathematical statements and their proofs from scholarly PDF articles as a multimodal classification problem, utilizing text, font features, and bitmap image renderings of PDFs as distinct modalities. We propose a modular sequential multimodal machine learning approach specifically designed for extracting theorem-like environments and proofs. This is based on a cross-modal attention mechanism to generate multimodal paragraph embeddings, which are then fed into our novel multimodal sliding window transformer architecture to capture sequential information across paragraphs. Our document AI methodology stands out as it eliminates the need for OCR preprocessing, LaTeX sources during inference, or custom pre-training on specialized losses to understand cross-modality relationships. Unlike many conventional approaches that operate at a single-page level, ours can be directly applied to multi-page PDFs and seamlessly handles the page breaks often found in lengthy scientific mathematical documents. Our approach demonstrates performance improvements obtained by transitioning from unimodality to multimodality, and finally by incorporating sequential modeling over paragraphs.

SEJan 10, 2013Code
API Blender: A Uniform Interface to Social Platform APIs

Georges Gouriten, Pierre Senellart

With the growing success of the social Web, most Web developers have to interact with at least one social Web platform, which implies studying the related API specifications. These are often only informally described, may contain errors, lack harmonization, and generally speaking make the developer's work difficult. Most attempts to solve this problem, proposing formal description languages for Web service APIs, have had limited success outside of B2B applications; we believe it is due to their top-down nature. In addition, a programmer dealing with one or several of these APIs has to deal with a number of related tasks such as data integration, requests chaining, or policy management, that are cumbersome to implement. Inspired by the SPORE project, we present API Blender, an open-source solution to describe, interact with, and integrate the most common social Web APIs. In this perspective, we first introduce two new lightweight description formats for requests and services and demonstrate their relevance with respect to current platform APIs. We present our Python implementation of API Blender and its features regarding authentication, policy management and multi-platform data integration.

AINov 21, 2023
Extracting Definienda in Mathematical Scholarly Articles with Transformers

Shufan Jiang, Pierre Senellart

We consider automatically identifying the defined term within a mathematical definition from the text of an academic article. Inspired by the development of transformer-based natural language processing applications, we pose the problem as (a) a token-level classification task using fine-tuned pre-trained transformers; and (b) a question-answering task using a generalist large language model (GPT). We also propose a rule-based approach to build a labeled dataset from the LATEX source of papers. Experimental results show that it is possible to reach high levels of precision and recall using either recent (and expensive) GPT 4 or simpler pre-trained models fine-tuned on our task.

DBJan 12, 2024
Expected Shapley-Like Scores of Boolean Functions: Complexity and Applications to Probabilistic Databases

Pratik Karmakar, Mikaël Monet, Pierre Senellart et al.

Shapley values, originating in game theory and increasingly prominent in explainable AI, have been proposed to assess the contribution of facts in query answering over databases, along with other similar power indices such as Banzhaf values. In this work we adapt these Shapley-like scores to probabilistic settings, the objective being to compute their expected value. We show that the computations of expected Shapley values and of the expected values of Boolean functions are interreducible in polynomial time, thus obtaining the same tractability landscape. We investigate the specific tractable case where Boolean functions are represented as deterministic decomposable circuits, designing a polynomial-time algorithm for this setting. We present applications to probabilistic databases through database provenance, and an effective implementation of this algorithm within the ProvSQL system, which experimentally validates its feasibility over a standard benchmark.

LGMay 4, 2018
BelMan: Bayesian Bandits on the Belief--Reward Manifold

Debabrota Basu, Pierre Senellart, Stéphane Bressan

We propose a generic, Bayesian, information geometric approach to the exploration--exploitation trade-off in multi-armed bandit problems. Our approach, BelMan, uniformly supports pure exploration, exploration--exploitation, and two-phase bandit problems. The knowledge on bandit arms and their reward distributions is summarised by the barycentre of the joint distributions of beliefs and rewards of the arms, the \emph{pseudobelief-reward}, within the beliefs-rewards manifold. BelMan alternates \emph{information projection} and \emph{reverse information projection}, i.e., projection of the pseudobelief-reward onto beliefs-rewards to choose the arm to play, and projection of the resulting beliefs-rewards onto the pseudobelief-reward. It introduces a mechanism that infuses an exploitative bias by means of a \emph{focal distribution}, i.e., a reward distribution that gradually concentrates on higher rewards. Comparative performance evaluation with state-of-the-art algorithms shows that BelMan is not only competitive but can also outperform other approaches in specific setups, for instance involving many arms and continuous rewards.