Laetitia Chapel

h-index18

16papers

644citations

Novelty51%

AI Score46

Ranked #64,976 of 201,326 authors (top 32%)#14,750 in LG (top 35%)

16 Papers

LGFeb 6Code

Vision Transformer Finetuning Benefits from Non-Smooth Components

Ambroise Odonnat, Laetitia Chapel, Romain Tavenard et al.

The smoothness of the transformer architecture has been extensively studied in the context of generalization, training stability, and adversarial robustness. However, its role in transfer learning remains poorly understood. In this paper, we analyze the ability of vision transformer components to adapt their outputs to changes in inputs, or, in other words, their plasticity. Defined as an average rate of change, it captures the sensitivity to input perturbation; in particular, a high plasticity implies low smoothness. We demonstrate through theoretical analysis and comprehensive experiments that this perspective provides principled guidance in choosing the components to prioritize during adaptation. A key takeaway for practitioners is that the high plasticity of the attention modules and feedforward layers consistently leads to better finetuning performance. Our findings depart from the prevailing assumption that smoothness is desirable, offering a novel perspective on the functional properties of transformers. The code is available at https://github.com/ambroiseodt/vit-plasticity.

LGNov 18, 2022

Hyperbolic Sliced-Wasserstein via Geodesic and Horospherical Projections

Clément Bonet, Laetitia Chapel, Lucas Drumetz et al.

It has been shown beneficial for many types of data which present an underlying hierarchical structure to be embedded in hyperbolic spaces. Consequently, many tools of machine learning were extended to such spaces, but only few discrepancies to compare probability distributions defined over those spaces exist. Among the possible candidates, optimal transport distances are well defined on such Riemannian manifolds and enjoy strong theoretical properties, but suffer from high computational cost. On Euclidean spaces, sliced-Wasserstein distances, which leverage a closed-form of the Wasserstein distance in one dimension, are more computationally efficient, but are not readily available on hyperbolic spaces. In this work, we propose to derive novel hyperbolic sliced-Wasserstein discrepancies. These constructions use projections on the underlying geodesics either along horospheres or geodesics. We study and compare them on different tasks where hyperbolic representations are relevant, such as sampling or image classification.

LGAug 24, 2023

Match-And-Deform: Time Series Domain Adaptation through Optimal Transport and Temporal Alignment

François Painblanc, Laetitia Chapel, Nicolas Courty et al.

While large volumes of unlabeled data are usually available, associated labels are often scarce. The unsupervised domain adaptation problem aims at exploiting labels from a source domain to classify data from a related, yet different, target domain. When time series are at stake, new difficulties arise as temporal shifts may appear in addition to the standard feature distribution shift. In this paper, we introduce the Match-And-Deform (MAD) approach that aims at finding correspondences between the source and target time series while allowing temporal distortions. The associated optimization problem simultaneously aligns the series thanks to an optimal transport loss and the time stamps through dynamic time warping. When embedded into a deep neural network, MAD helps learning new representations of time series that both align the domains and maximize the discriminative power of the network. Empirical studies on benchmark datasets and remote sensing data demonstrate that MAD makes meaningful sample-to-sample pairing and time shift estimation, reaching similar or better classification performance than state-of-the-art deep time series domain adaptation strategies.

MLJul 4, 2023

Fast Optimal Transport through Sliced Wasserstein Generalized Geodesics

Guillaume Mahey, Laetitia Chapel, Gilles Gasso et al.

Wasserstein distance (WD) and the associated optimal transport plan have been proven useful in many applications where probability measures are at stake. In this paper, we propose a new proxy of the squared WD, coined min-SWGG, that is based on the transport map induced by an optimal one-dimensional projection of the two input distributions. We draw connections between min-SWGG and Wasserstein generalized geodesics in which the pivot measure is supported on a line. We notably provide a new closed form for the exact Wasserstein distance in the particular case of one of the distributions supported on a line allowing us to derive a fast computational scheme that is amenable to gradient descent optimization. We show that min-SWGG is an upper bound of WD and that it has a complexity similar to as Sliced-Wasserstein, with the additional feature of providing an associated transport plan. We also investigate some theoretical properties such as metricity, weak convergence, computational and topological properties. Empirical evidences support the benefits of min-SWGG in various contexts, from gradient flows, shape matching and image colorization, among others.

LGMay 28, 2025

Differentiable Generalized Sliced Wasserstein Plans

Laetitia Chapel, Romain Tavenard, Samuel Vaiter

Optimal Transport (OT) has attracted significant interest in the machine learning community, not only for its ability to define meaningful distances between probability distributions -- such as the Wasserstein distance -- but also for its formulation of OT plans. Its computational complexity remains a bottleneck, though, and slicing techniques have been developed to scale OT to large datasets. Recently, a novel slicing scheme, dubbed min-SWGG, lifts a single one-dimensional plan back to the original multidimensional space, finally selecting the slice that yields the lowest Wasserstein distance as an approximation of the full OT plan. Despite its computational and theoretical advantages, min-SWGG inherits typical limitations of slicing methods: (i) the number of required slices grows exponentially with the data dimension, and (ii) it is constrained to linear projections. Here, we reformulate min-SWGG as a bilevel optimization problem and propose a differentiable approximation scheme to efficiently identify the optimal slice, even in high-dimensional settings. We furthermore define its generalized extension for accommodating to data living on manifolds. Finally, we demonstrate the practical value of our approach in various applications, including gradient flows on manifolds and high-dimensional spaces, as well as a novel sliced OT-based conditional flow matching for image generation -- where fast computation of transport plans is essential.

CVMar 5

Layer by layer, module by module: Choose both for optimal OOD probing of ViT

Ambroise Odonnat, Vasilii Feofanov, Laetitia Chapel et al.

Recent studies have observed that intermediate layers of foundation models often yield more discriminative representations than the final layer. While initially attributed to autoregressive pretraining, this phenomenon has also been identified in models trained via supervised and discriminative self-supervised objectives. In this paper, we conduct a comprehensive study to analyze the behavior of intermediate layers in pretrained vision transformers. Through extensive linear probing experiments across a diverse set of image classification benchmarks, we find that distribution shift between pretraining and downstream data is the primary cause of performance degradation in deeper layers. Furthermore, we perform a fine-grained analysis at the module level. Our findings reveal that standard probing of transformer block outputs is suboptimal; instead, probing the activation within the feedforward network yields the best performance under significant distribution shift, whereas the normalized output of the multi-head self-attention module is optimal when the shift is weak.

LGMay 27, 2025

Bridging Arbitrary and Tree Metrics via Differentiable Gromov Hyperbolicity

Pierre Houedry, Nicolas Courty, Florestan Martin-Baillon et al.

Trees and the associated shortest-path tree metrics provide a powerful framework for representing hierarchical and combinatorial structures in data. Given an arbitrary metric space, its deviation from a tree metric can be quantified by Gromov's $δ$-hyperbolicity. Nonetheless, designing algorithms that bridge an arbitrary metric to its closest tree metric is still a vivid subject of interest, as most common approaches are either heuristical and lack guarantees, or perform moderately well. In this work, we introduce a novel differentiable optimization framework, coined DeltaZero, that solves this problem. Our method leverages a smooth surrogate for Gromov's $δ$-hyperbolicity which enables a gradient-based optimization, with a tractable complexity. The corresponding optimization procedure is derived from a problem with better worst case guarantees than existing bounds, and is justified statistically. Experiments on synthetic and real-world datasets demonstrate that our method consistently achieves state-of-the-art distortion.

OCJun 8, 2021

Unbalanced Optimal Transport through Non-negative Penalized Linear Regression

Laetitia Chapel, Rémi Flamary, Haoran Wu et al.

This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to propose novel algorithms inspired from inverse problems and nonnegative matrix factorization. In particular, we consider majorization-minimization which leads in our setting to efficient multiplicative updates for a variety of penalties. Furthermore, we derive for the first time an efficient algorithm to compute the regularization path of UOT with quadratic penalties. The proposed algorithm provides a continuity of piece-wise linear OT plans converging to the solution of balanced OT (corresponding to infinite penalty weights). We perform several numerical experiments on simulated and real data illustrating the new algorithms, and provide a detailed discussion about more sophisticated optimization tools that can further be used to solve OT problems thanks to our reformulation.

MLFeb 19, 2020

Partial Optimal Transport with Applications on Positive-Unlabeled Learning

Laetitia Chapel, Mokhtar Z. Alaya, Gilles Gasso

Classical optimal transport problem seeks a transportation map that preserves the total mass betwenn two probability distributions, requiring their mass to be the same. This may be too restrictive in certain applications such as color or shape matching, since the distributions may have arbitrary masses and/or that only a fraction of the total mass has to be transported. Several algorithms have been devised for computing partial Wasserstein metrics that rely on an entropic regularization, but when it comes with exact solutions, almost no partial formulation of neither Wasserstein nor Gromov-Wasserstein are available yet. This precludes from working with distributions that do not lie in the same metric space or when invariance to rotation or translation is needed. In this paper, we address the partial Wasserstein and Gromov-Wasserstein problems and propose exact algorithms to solve them. We showcase the new formulation in a positive-unlabeled (PU) learning application. To the best of our knowledge, this is the first application of optimal transport in this context and we first highlight that partial Wasserstein-based metrics prove effective in usual PU learning settings. We then demonstrate that partial Gromov-Wasserstein metrics is efficient in scenario where point clouds come from different domains or have different features.

MLFeb 10, 2020

Time Series Alignment with Global Invariances

Titouan Vayer, Romain Tavenard, Laetitia Chapel et al.

Multivariate time series are ubiquitous objects in signal processing. Measuring a distance or similarity between two such objects is of prime interest in a variety of applications, including machine learning, but can be very difficult as soon as the temporal dynamics and the representation of the time series, {\em i.e.} the nature of the observed quantities, differ from one another. In this work, we propose a novel distance accounting both feature space and temporal variabilities by learning a latent global transformation of the feature space together with a temporal alignment, cast as a joint optimization problem. The versatility of our framework allows for several variants depending on the invariance class at stake. Among other contributions, we define a differentiable loss for time series and present two algorithms for the computation of time series barycenters under this new geometry. We illustrate the interest of our approach on both simulated and real world data and show the robustness of our approach compared to state-of-the-art methods.

MLMay 24, 2019

Sliced Gromov-Wasserstein

Titouan Vayer, Rémi Flamary, Romain Tavenard et al.

Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions whose supports do not necessarily lie in the same metric space. However, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. Contrary to GW, the Wasserstein distance (W) enjoys several properties (e.g. duality) that permit large scale optimization. Among those, the solution of W on the real line, that only requires sorting discrete samples in 1D, allows defining the Sliced Wasserstein (SW) distance. This paper proposes a new divergence based on GW akin to SW. We first derive a closed form for GW when dealing with 1D distributions, based on a new result for the related quadratic assignment problem. We then define a novel OT discrepancy that can deal with large scale distributions via a slicing approach and we show how it relates to the GW distance while being $O(n\log(n))$ to compute. We illustrate the behavior of this so called Sliced Gromov-Wasserstein (SGW) discrepancy in experiments where we demonstrate its ability to tackle similar problems as GW while being several order of magnitudes faster to compute.

MLMay 23, 2018

Optimal Transport for structured data with application on graphs

Titouan Vayer, Laetitia Chapel, Rémi Flamary et al.

This work considers the problem of computing distances between structured objects such as undirected graphs, seen as probability distributions in a specific metric space. We consider a new transportation distance (i.e. that minimizes a total cost of transporting probability masses) that unveils the geometric nature of the structured objects space. Unlike Wasserstein or Gromov-Wasserstein metrics that focus solely and respectively on features (by considering a metric in the feature space) or structure (by seeing structure as a metric space), our new distance exploits jointly both information, and is consequently called Fused Gromov-Wasserstein (FGW). After discussing its properties and computational aspects, we show results on a graph classification task, where our method outperforms both graph kernels and deep graph convolutional networks. Exploiting further on the metric properties of FGW, interesting geometric objects such as Fréchet means or barycenters of graphs are illustrated and discussed in a clustering context.

CVJul 9, 2016

Combining multiple resolutions into hierarchical representations for kernel-based image classification

Yanwei Cui, Sébastien Lefevre, Laetitia Chapel et al.

Geographic object-based image analysis (GEOBIA) framework has gained increasing interest recently. Following this popular paradigm, we propose a novel multiscale classification approach operating on a hierarchical image representation built from two images at different resolutions. They capture the same scene with different sensors and are naturally fused together through the hierarchical representation, where coarser levels are built from a Low Spatial Resolution (LSR) or Medium Spatial Resolution (MSR) image while finer levels are generated from a High Spatial Resolution (HSR) or Very High Spatial Resolution (VHSR) image. Such a representation allows one to benefit from the context information thanks to the coarser levels, and subregions spatial arrangement information thanks to the finer levels. Two dedicated structured kernels are then used to perform machine learning directly on the constructed hierarchical representation. This strategy overcomes the limits of conventional GEOBIA classification procedures that can handle only one or very few pre-selected scales. Experiments run on an urban classification task show that the proposed approach can highly improve the classification accuracy w.r.t. conventional approaches working on a single scale.

CVJun 15, 2016

Combining multiscale features for classification of hyperspectral images: a sequence based kernel approach

Yanwei Cui, Laetitia Chapel, Sébastien Lefèvre

Nowadays, hyperspectral image classification widely copes with spatial information to improve accuracy. One of the most popular way to integrate such information is to extract hierarchical features from a multiscale segmentation. In the classification context, the extracted features are commonly concatenated into a long vector (also called stacked vector), on which is applied a conventional vector-based machine learning technique (e.g. SVM with Gaussian kernel). In this paper, we rather propose to use a sequence structured kernel: the spectrum kernel. We show that the conventional stacked vector-based kernel is actually a special case of this kernel. Experiments conducted on various publicly available hyperspectral datasets illustrate the improvement of the proposed kernel w.r.t. conventional ones using the same hierarchical spatial features.

CVApr 6, 2016

A Subpath Kernel for Learning Hierarchical Image Representations

Yanwei Cui, Laetitia Chapel, Sébastien Lefèvre

Tree kernels have demonstrated their ability to deal with hierarchical data, as the intrinsic tree structure often plays a discriminative role. While such kernels have been successfully applied to various domains such as nature language processing and bioinformatics, they mostly concentrate on ordered trees and whose nodes are described by symbolic data. Meanwhile, hierarchical representations have gained increasing interest to describe image content. This is particularly true in remote sensing, where such representations allow for revealing different objects of interest at various scales through a tree structure. However, the induced trees are unordered and the nodes are equipped with numerical features. In this paper, we propose a new structured kernel for hierarchical image representations which is built on the concept of subpath kernel. Experimental results on both artificial and remote sensing datasets show that the proposed kernel manages to deal with the hierarchical nature of the data, leading to better classification rates.

LGJan 8, 2016

Dense Bag-of-Temporal-SIFT-Words for Time Series Classification

Adeline Bailly, Simon Malinowski, Romain Tavenard et al.

Time series classification is an application of particular interest with the increase of data to monitor. Classical techniques for time series classification rely on point-to-point distances. Recently, Bag-of-Words approaches have been used in this context. Words are quantized versions of simple features extracted from sliding windows. The SIFT framework has proved efficient for image classification. In this paper, we design a time series classification scheme that builds on the SIFT framework adapted to time series to feed a Bag-of-Words. We then refine our method by studying the impact of normalized Bag-of-Words, as well as densely extract point descriptors. Proposed adjustements achieve better performance. The evaluation shows that our method outperforms classical techniques in terms of classification.