LGOct 17, 2023Code
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal TransportQuentin Bouniot, Ievgen Redko, Anton Mallasto et al.
In the last decade, we have witnessed the introduction of several novel deep neural network (DNN) architectures exhibiting ever-increasing performance across diverse tasks. Explaining the upward trend of their performance, however, remains difficult as different DNN architectures of comparable depth and width -- common factors associated with their expressive power -- may exhibit a drastically different performance even when trained on the same dataset. In this paper, we introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for approximately measuring the non-linearity of deep neural networks. Built upon a score derived from closed-form optimal transport mappings, this signature provides a better understanding of the inner workings of a wide range of DNN architectures and learning paradigms, with a particular emphasis on the computer vision task. We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature and its potential for long-reaching implications. The code for our work is available at https://github.com/qbouniot/AffScoreDeep
MLJun 1
Doing well with less! On Sampling Techniques for Empirical Pairwise Loss Estimation/MinimizationLouise Davy, Stephan Clémençon, Charlotte Laclau
Many machine learning problems, including similarity learning, ranking, and clustering, rely on empirical pairwise loss functions whose quadratic computational cost quickly becomes prohibitive at scale. We demonstrate how a frugal approach that retains only a fraction of the available information on pairs can achieve estimation or optimization performance comparable to that obtained by using all pairs, by leveraging survey sampling techniques. A central finding, supported by both theory and experiments, is that such sampling plans must target pairs directly rather than individual observations. In particular, for pairwise losses between high-dimensional vectors such as embeddings in vision or graph learning, assigning higher inclusion probabilities to informative pairs using suitable auxiliary information yields performance close to full pairwise evaluation, providing a principled and theoretically grounded trade-off between accuracy and computational cost.
LGMay 11, 2022
A Survey on Fairness for Machine Learning on GraphsCharlotte Laclau, Christine Largeron, Manvi Choudhary
Nowadays, the analysis of complex phenomena modeled by graphs plays a crucial role in many real-world application domains where decisions can have a strong societal impact. However, numerous studies and papers have recently revealed that machine learning models could lead to potential disparate treatment between individuals and unfair outcomes. In that context, algorithmic contributions for graph mining are not spared by the problem of fairness and present some specific challenges related to the intrinsic nature of graphs: (1) graph data is non-IID, and this assumption may invalidate many existing studies in fair machine learning, (2) suited metric definitions to assess the different types of fairness with relational data and (3) algorithmic challenge on the difficulty of finding a good trade-off between model accuracy and fairness. This survey is the first one dedicated to fairness for relational data. It aims to present a comprehensive review of state-of-the-art techniques in fairness on graph mining and identify the open challenges and future trends. In particular, we start by presenting several sensible application domains and the associated graph mining tasks with a focus on edge prediction and node classification in the sequel. We also recall the different metrics proposed to evaluate potential bias at different levels of the graph mining process; then we provide a comprehensive overview of recent contributions in the domain of fair machine learning for graphs, that we classify into pre-processing, in-processing and post-processing models. We also propose to describe existing graph data, synthetic and real-world benchmarks. Finally, we present in detail five potential promising directions to advance research in studying algorithmic fairness on graphs.
LGMar 14, 2022
SimHawNet: A Modified Hawkes Process for Temporal Network SimulationMathilde Perez, Raphaël Romero, Bo Kang et al.
Temporal networks allow representing connections between objects while incorporating the temporal dimension. While static network models can capture unchanging topological regularities, they often fail to model the effects associated with the causal generative process of the network that occurs in time. Hence, exploiting the temporal aspect of networks has been the focus of many recent studies. In this context, we propose a new framework for generative models of continuous-time temporal networks. We assume that the activation of the edges in a temporal network is driven by a specified temporal point process. This approach allows to directly model the waiting time between events while incorporating time-varying history-based features as covariates in the predictions. Coupled with a thinning algorithm designed for the simulation of point processes, SimHawNet enables simulation of the evolution of temporal networks in continuous time. Finally, we introduce a comprehensive evaluation framework to assess the performance of such an approach, in which we demonstrate that SimHawNet successfully simulates the evolution of networks with very different generative processes and achieves performance comparable to the state of the art, while being significantly faster.
SIMar 4
How Predicted Links Influence Network Evolution: Disentangling Choice and Algorithmic Feedback in Dynamic GraphsMathilde Perez, Raphaël Romero, Jefrey Lijffijt et al.
Link prediction models are increasingly used to recommend interactions in evolving networks, yet their impact on network structure is typically assessed from static snapshots. In particular, observed homophily conflates intrinsic interaction tendencies with amplification effects induced by network dynamics and algorithmic feedback. We propose a temporal framework based on multivariate Hawkes processes that disentangles these two sources and introduce an instantaneous bias measure derived from interaction intensities, capturing current reinforcement dynamics beyond cumulative metrics. We provide a theoretical characterization of the stability and convergence of the induced dynamics, and experiments show that the proposed measure reliably reflects algorithmic feedback effects across different link prediction strategies.
LGMay 19
MSAlign: Aligning Molecule and Mass Spectra Foundation Models for Metabolite IdentificationPaul Krzakala, Gabriel Melo, Camille Lançon et al.
Accurately identifying metabolites i.e. small molecules from mass spectrometry data remains a core challenge in metabolomics, with broad applications in drug discovery, environmental analysis, and clinical research. We address the Molecule Retrieval task, which consists in recovering the chemical structure of a metabolite from its MS/MS spectrum given a set of candidate molecules. While the recent release of benchmark datasets such as MassSpecGym and Spectraverse has considerably accelerated the development of novel machine learning approaches, the complexity of data preprocessing pipelines and the lack of unified implementations make methods and results difficult to reproduce and compare. We make three contributions. First, we propose a unified framework encompassing recent approaches based on representation alignment and contrastive learning. Second, we introduce MSAlign, inspired by multimodal alignment in vision-language models, which learns a shared representation space by aligning two frozen foundation models (DreaMS for mass spectra and ChemBERTa for molecules) through lightweight MLP projections trained with a candidate-based contrastive objective. MSAlign is simple to implement, fast to train and consistently outperforms existing approaches across all benchmarks. Third, we investigate a long-standing evaluation problem: data splitting strategies in molecule retrieval implicitly trade off data leakage against domain shift. We formalize this tension by introducing a quantitative measure of distribution shift, and use it to evaluate splitting strategies in existing benchmarks. All datasets, splits, candidate sets, and a unified implementation of MSAlign and baselines are publicly released to support reproducible research.
CLNov 21, 2023
Fair Text Classification with Wasserstein IndependenceThibaud Leteno, Antoine Gourru, Charlotte Laclau et al.
Group fairness is a central research topic in text classification, where reaching fair treatment between sensitive groups (e.g. women vs. men) remains an open challenge. This paper presents a novel method for mitigating biases in neural text classification, agnostic to the model architecture. Considering the difficulty to distinguish fair from unfair information in a text encoder, we take inspiration from adversarial training to induce Wasserstein independence between representations learned to predict our target label and the ones learned to predict some sensitive attribute. Our approach provides two significant advantages. Firstly, it does not require annotations of sensitive attributes in both testing and training data. This is more suitable for real-life scenarios compared to existing methods that require annotations of sensitive attributes at train time. Second, our approach exhibits a comparable or better fairness-accuracy trade-off compared to existing methods.
LGFeb 9
Drop the mask! GAMM-A Taxonomy for Graph Attributes Missing MechanismsRichard Serrano, Baptiste Jeudy, Charlotte Laclau et al.
Exploring missing data in attributed graphs introduces unique challenges beyond those found in tabular datasets. In this work, we extend the taxonomy for missing data mechanisms to attributed graphs by proposing GAMM (Graph Attributes Missing Mechanisms), a framework that systematically links missingness probability to both node attributes and the underlying graph structure. Our taxonomy enriches the conventional definitions of masking mechanisms by introducing graph-specific dependencies. We empirically demonstrate that state-of-the-art imputation methods, while effective on traditional masks, significantly struggle when confronted with these more realistic graph-aware missingness scenarios.
LGFeb 8
Homophily-aware Supervised Contrastive Counterfactual Augmented Fair Graph Neural NetworkMahdi Tavassoli Kejani, Fadi Dornaika, Charlotte Laclau et al.
In recent years, Graph Neural Networks (GNNs) have achieved remarkable success in tasks such as node classification, link prediction, and graph representation learning. However, they remain susceptible to biases that can arise not only from node attributes but also from the graph structure itself. Addressing fairness in GNNs has therefore emerged as a critical research challenge. In this work, we propose a novel model for training fairness-aware GNNs by improving the counterfactual augmented fair graph neural network framework (CAF). Specifically, our approach introduces a two-phase training strategy: in the first phase, we edit the graph to increase homophily ratio with respect to class labels while reducing homophily ratio with respect to sensitive attribute labels; in the second phase, we integrate a modified supervised contrastive loss and environmental loss into the optimization process, enabling the model to jointly improve predictive performance and fairness. Experiments on five real-world datasets demonstrate that our model outperforms CAF and several state-of-the-art graph-based learning methods in both classification accuracy and fairness metrics.
LGFeb 12
TopoFair: Linking Topological Bias to Fairness in Link Prediction BenchmarksLilian Marey, Mathilde Perez, Tiphaine Viard et al.
Graph link prediction (LP) plays a critical role in socially impactful applications, such as job recommendation and friendship formation. Ensuring fairness in this task is thus essential. While many fairness-aware methods manipulate graph structures to mitigate prediction disparities, the topological biases inherent to social graph structures remain poorly understood and are often reduced to homophily alone. This undermines the generalization potential of fairness interventions and limits their applicability across diverse network topologies. In this work, we propose a novel benchmarking framework for fair LP, centered on the structural biases of the underlying graphs. We begin by reviewing and formalizing a broad taxonomy of topological bias measures relevant to fairness in graphs. In parallel, we introduce a flexible graph generation method that simultaneously ensures fidelity to real-world graph patterns and enables controlled variation across a wide spectrum of structural biases. We apply this framework to evaluate both classical and fairness-aware LP models across multiple use cases. Our results provide a fine-grained empirical analysis of the interactions between predictive fairness and structural biases. This new perspective reveals the sensitivity of fairness interventions to beyond-homophily biases and underscores the need for structurally grounded fairness evaluations in graph learning.
LGMar 4
k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order NeighborhoodsLilian Marey, Tiphaine Viard, Charlotte Laclau
Link prediction (LP) plays a central role in graph-based applications, particularly in social recommendation. However, real-world graphs often reflect structural biases, most notably homophily, the tendency of nodes with similar attributes to connect. While this property can improve predictive performance, it also risks reinforcing existing social disparities. In response, fairness-aware LP methods have emerged, often seeking to mitigate these effects by promoting inter-group connections, that is, links between nodes with differing sensitive attributes (e.g., gender), following the principle of dyadic fairness. However, dyadic fairness overlooks potential disparities within the sensitive groups themselves. To overcome this issue, we propose $k$-hop fairness, a structural notion of fairness for LP, that assesses disparities conditioned on the distance between nodes in the graph. We formalize this notion through predictive fairness and structural bias metrics, and propose pre- and post-processing mitigation strategies. Experiments across standard LP benchmarks reveal: (1) a strong tendency of models to reproduce structural biases at different $k$-hops; (2) interdependence between structural biases at different hops when rewiring graphs; and (3) that our post-processing method achieves favorable $k$-hop performance-fairness trade-offs compared to existing fair LP baselines.
CLJan 12, 2024
An investigation of structures responsible for gender bias in BERT and DistilBERTThibaud Leteno, Antoine Gourru, Charlotte Laclau et al.
In recent years, large Transformer-based Pre-trained Language Models (PLM) have changed the Natural Language Processing (NLP) landscape, by pushing the performance boundaries of the state-of-the-art on a wide variety of tasks. However, this performance gain goes along with an increase in complexity, and as a result, the size of such models (up to billions of parameters) represents a constraint for their deployment on embedded devices or short-inference time tasks. To cope with this situation, compressed models emerged (e.g. DistilBERT), democratizing their usage in a growing number of applications that impact our daily lives. A crucial issue is the fairness of the predictions made by both PLMs and their distilled counterparts. In this paper, we propose an empirical exploration of this problem by formalizing two questions: (1) Can we identify the neural mechanism(s) responsible for gender bias in BERT (and by extension DistilBERT)? (2) Does distillation tend to accentuate or mitigate gender bias (e.g. is DistilBERT more prone to gender bias than its uncompressed version, BERT)? Our findings are the following: (I) one cannot identify a specific layer that produces bias; (II) every attention head uniformly encodes bias; except in the context of underrepresented classes with a high imbalance of the sensitive attribute; (III) this subset of heads is different as we re-fine tune the network; (IV) bias is more homogeneously produced by the heads in the distilled model.
LGFeb 19, 2024
Any2Graph: Deep End-To-End Supervised Graph Prediction With An Optimal Transport LossPaul Krzakala, Junjie Yang, Rémi Flamary et al.
We propose Any2graph, a generic framework for end-to-end Supervised Graph Prediction (SGP) i.e. a deep learning model that predicts an entire graph for any kind of input. The framework is built on a novel Optimal Transport loss, the Partially-Masked Fused Gromov-Wasserstein, that exhibits all necessary properties (permutation invariance, differentiability and scalability) and is designed to handle any-sized graphs. Numerical experiments showcase the versatility of the approach that outperform existing competitors on a novel challenging synthetic dataset and a variety of real-world tasks such as map construction from satellite image (Sat2Graph) or molecule prediction from fingerprint (Fingerprint2Graph).
IRMay 6, 2025
Modeling Musical Genre Trajectories through Pathlet LearningLilian Marey, Charlotte Laclau, Bruno Sguerra et al.
The increasing availability of user data on music streaming platforms opens up new possibilities for analyzing music consumption. However, understanding the evolution of user preferences remains a complex challenge, particularly as their musical tastes change over time. This paper uses the dictionary learning paradigm to model user trajectories across different musical genres. We define a new framework that captures recurring patterns in genre trajectories, called pathlets, enabling the creation of comprehensible trajectory embeddings. We show that pathlet learning reveals relevant listening patterns that can be analyzed both qualitatively and quantitatively. This work improves our understanding of users' interactions with music and opens up avenues of research into user behavior and fostering diversity in recommender systems. A dataset of 2000 user histories tagged by genre over 17 months, supplied by Deezer (a leading music streaming company), is also released with the code.
CLJan 28, 2025
Histoires Morales: A French Dataset for Assessing Moral AlignmentThibaud Leteno, Irina Proskurina, Antoine Gourru et al.
Aligning language models with human values is crucial, especially as they become more integrated into everyday life. While models are often adapted to user preferences, it is equally important to ensure they align with moral norms and behaviours in real-world social situations. Despite significant progress in languages like English and Chinese, French has seen little attention in this area, leaving a gap in understanding how LLMs handle moral reasoning in this language. To address this gap, we introduce Histoires Morales, a French dataset derived from Moral Stories, created through translation and subsequently refined with the assistance of native speakers to guarantee grammatical accuracy and adaptation to the French cultural context. We also rely on annotations of the moral values within the dataset to ensure their alignment with French norms. Histoires Morales covers a wide range of social situations, including differences in tipping practices, expressions of honesty in relationships, and responsibilities toward animals. To foster future research, we also conduct preliminary experiments on the alignment of multilingual models on French and English data and the robustness of the alignment. We find that while LLMs are generally aligned with human moral norms by default, they can be easily influenced with user-preference optimization for both moral and immoral data.
LGMay 28, 2025
The quest for the GRAph Level autoEncoder (GRALE)Paul Krzakala, Gabriel Melo, Charlotte Laclau et al.
Although graph-based learning has attracted a lot of attention, graph representation learning is still a challenging task whose resolution may impact key application fields such as chemistry or biology. To this end, we introduce GRALE, a novel graph autoencoder that encodes and decodes graphs of varying sizes into a shared embedding space. GRALE is trained using an Optimal Transport-inspired loss that compares the original and reconstructed graphs and leverages a differentiable node matching module, which is trained jointly with the encoder and decoder. The proposed attention-based architecture relies on Evoformer, the core component of AlphaFold, which we extend to support both graph encoding and decoding. We show, in numerical experiments on simulated and molecular data, that GRALE enables a highly general form of pre-training, applicable to a wide range of downstream tasks, from classification and regression to more complex tasks such as graph interpolation, editing, matching, and prediction.
LGMar 10, 2025
Fair Text Classification via Transferable RepresentationsThibaud Leteno, Michael Perrot, Charlotte Laclau et al.
Group fairness is a central research topic in text classification, where reaching fair treatment between sensitive groups (e.g., women and men) remains an open challenge. We propose an approach that extends the use of the Wasserstein Dependency Measure for learning unbiased neural text classifiers. Given the challenge of distinguishing fair from unfair information in a text encoder, we draw inspiration from adversarial training by inducing independence between representations learned for the target label and those for a sensitive attribute. We further show that Domain Adaptation can be efficiently leveraged to remove the need for access to the sensitive attributes in the dataset we cure. We provide both theoretical and empirical evidence that our approach is well-founded.
IRFeb 26, 2022
Learning over No-Preferred and Preferred Sequence of Items for Robust Recommendation (Extended Abstract)Aleksandra Burashnikova, Yury Maximov, Marianne Clausel et al.
This paper is an extended version of [Burashnikova et al., 2021, arXiv: 2012.06910], where we proposed a theoretically supported sequential strategy for training a large-scale Recommender System (RS) over implicit feedback, mainly in the form of clicks. The proposed approach consists in minimizing pairwise ranking loss over blocks of consecutive items constituted by a sequence of non-clicked items followed by a clicked one for each user. We present two variants of this strategy where model parameters are updated using either the momentum method or a gradient-based approach. To prevent updating the parameters for an abnormally high number of clicks over some targeted items (mainly due to bots), we introduce an upper and a lower threshold on the number of updates for each user. These thresholds are estimated over the distribution of the number of blocks in the training set. They affect the decision of RS by shifting the distribution of items that are shown to the users. Furthermore, we provide a convergence analysis of both algorithms and demonstrate their practical efficiency over six large-scale collections with respect to various ranking measures.
IRDec 12, 2020
Learning over no-Preferred and Preferred Sequence of items for Robust RecommendationAleksandra Burashnikova, Marianne Clausel, Charlotte Laclau et al.
In this paper, we propose a theoretically founded sequential strategy for training large-scale Recommender Systems (RS) over implicit feedback, mainly in the form of clicks. The proposed approach consists in minimizing pairwise ranking loss over blocks of consecutive items constituted by a sequence of non-clicked items followed by a clicked one for each user. We present two variants of this strategy where model parameters are updated using either the momentum method or a gradient-based approach. To prevent from updating the parameters for an abnormally high number of clicks over some targeted items (mainly due to bots), we introduce an upper and a lower threshold on the number of updates for each user. These thresholds are estimated over the distribution of the number of blocks in the training set. The thresholds affect the decision of RS and imply a shift over the distribution of items that are shown to the users. Furthermore, we provide a convergence analysis of both algorithms and demonstrate their practical efficiency over six large-scale collections, both regarding different ranking measures and computational time.
LGOct 30, 2020
All of the Fairness for Edge Prediction with Optimal TransportCharlotte Laclau, Ievgen Redko, Manvi Choudhary et al.
Machine learning and data mining algorithms have been increasingly used recently to support decision-making systems in many areas of high societal importance such as healthcare, education, or security. While being very efficient in their predictive abilities, the deployed algorithms sometimes tend to learn an inductive model with a discriminative bias due to the presence of this latter in the learning sample. This problem gave rise to a new field of algorithmic fairness where the goal is to correct the discriminative bias introduced by a certain attribute in order to decorrelate it from the model's output. In this paper, we study the problem of fairness for the task of edge prediction in graphs, a largely underinvestigated scenario compared to a more popular setting of fair classification. To this end, we formulate the problem of fair edge prediction, analyze it theoretically, and propose an embedding-agnostic repairing procedure for the adjacency matrix of an arbitrary graph with a trade-off between the group and individual fairness. We experimentally show the versatility of our approach and its capacity to provide explicit control over different notions of fairness and prediction accuracy.
LGOct 21, 2020
Deep Neural Networks Are Congestion Games: From Loss Landscape to Wardrop Equilibrium and BeyondNina Vesseron, Ievgen Redko, Charlotte Laclau
The theoretical analysis of deep neural networks (DNN) is arguably among the most challenging research directions in machine learning (ML) right now, as it requires from scientists to lay novel statistical learning foundations to explain their behaviour in practice. While some success has been achieved recently in this endeavour, the question on whether DNNs can be analyzed using the tools from other scientific fields outside the ML community has not received the attention it may well have deserved. In this paper, we explore the interplay between DNNs and game theory (GT), and show how one can benefit from the classic readily available results from the latter when analyzing the former. In particular, we consider the widely studied class of congestion games, and illustrate their intrinsic relatedness to both linear and non-linear DNNs and to the properties of their loss surface. Beyond retrieving the state-of-the-art results from the literature, we argue that our work provides a very promising novel tool for analyzing the DNNs and support this claim by proposing concrete open problems that can advance significantly our understanding of DNNs when solved.
LGSep 1, 2020
Rank-one partitioning: formalization, illustrative examples, and a new cluster enhancing strategyCharlotte Laclau, Franck Iutzeler, Ievgen Redko
In this paper, we introduce and formalize a rank-one partitioning learning paradigm that unifies partitioning methods that proceed by summarizing a data set using a single vector that is further used to derive the final clustering partition. Using this unification as a starting point, we propose a novel algorithmic solution for the partitioning problem based on rank-one matrix factorization and denoising of piecewise constant signals. Finally, we propose an empirical demonstration of our findings and demonstrate the robustness of the proposed denoising step. We believe that our work provides a new point of view for several unsupervised learning techniques that helps to gain a deeper understanding about the general mechanisms of data partitioning.
CLMay 11, 2018
Cross-lingual Document Retrieval using Regularized Wasserstein DistanceGeorgios Balikas, Charlotte Laclau, Ievgen Redko et al.
Many information retrieval algorithms rely on the notion of a good distance that allows to efficiently compare objects of different nature. Recently, a new promising metric called Word Mover's Distance was proposed to measure the divergence between text passages. In this paper, we demonstrate that this metric can be extended to incorporate term-weighting schemes and provide more accurate and computationally efficient matching between documents using entropic regularization. We evaluate the benefits of both extensions in the task of cross-lingual document retrieval (CLDR). Our experimental results on eight CLDR problems suggest that the proposed methods achieve remarkable improvements in terms of Mean Reciprocal Rank compared to several baselines.
MLMay 17, 2017
Co-clustering through Optimal TransportCharlotte Laclau, Ievgen Redko, Basarab Matei et al.
In this paper, we present a novel method for co-clustering, an unsupervised learning approach that aims at discovering homogeneous groups of data instances and features by grouping them simultaneously. The proposed method uses the entropy regularized optimal transport between empirical measures defined on data instances and features in order to obtain an estimated joint probability density function represented by the optimal coupling matrix. This matrix is further factorized to obtain the induced row and columns partitions using multiscale representations approach. To justify our method theoretically, we show how the solution of the regularized optimal transport can be seen from the variational inference perspective thus motivating its use for co-clustering. The algorithm derived for the proposed method and its kernelized version based on the notion of Gromov-Wasserstein distance are fast, accurate and can determine automatically the number of both row and column clusters. These features are vividly demonstrated through extensive experimental evaluations.
MLApr 29, 2017
Representation Learning and Pairwise Ranking for Implicit Feedback in Recommendation SystemsSumit Sidana, Mikhail Trofimov, Oleg Horodnitskii et al.
In this paper, we propose a novel ranking framework for collaborative filtering with the overall aim of learning user preferences over items by minimizing a pairwise ranking loss. We show the minimization problem involves dependent random variables and provide a theoretical analysis by proving the consistency of the empirical risk minimization in the worst case where all users choose a minimal number of positive and negative items. We further derive a Neural-Network model that jointly learns a new representation of users and items in an embedded space as well as the preference relation of users over the pairs of items. The learning objective is based on three scenarios of ranking losses that control the ability of the model to maintain the ordering over the items induced from the users' preferences, as well as, the capacity of the dot-product defined in the learned embedded space to produce the ordering. The proposed model is by nature suitable for implicit feedback and involves the estimation of only very few parameters. Through extensive experiments on several real-world benchmarks on implicit data, we show the interest of learning the preference and the embedding simultaneously when compared to learning those separately. We also demonstrate that our approach is very competitive with the best state-of-the-art collaborative filtering techniques proposed for implicit feedback.