LGJun 6, 2023Code
Learning Representations on the Unit Sphere: Investigating Angular Gaussian and von Mises-Fisher Distributions for Online Continual LearningNicolas Michel, Giovanni Chierchia, Romain Negrel et al.
We use the maximum a posteriori estimation principle for learning representations distributed on the unit sphere. We propose to use the angular Gaussian distribution, which corresponds to a Gaussian projected on the unit-sphere and derive the associated loss function. We also consider the von Mises-Fisher distribution, which is the conditional of a Gaussian in the unit-sphere. The learned representations are pushed toward fixed directions, which are the prior means of the Gaussians; allowing for a learning strategy that is resilient to data drift. This makes it suitable for online continual learning, which is the problem of training neural networks on a continuous data stream, where multiple classification tasks are presented sequentially so that data from past tasks are no longer accessible, and data from the current task can be seen only once. To address this challenging scenario, we propose a memory-based representation learning technique equipped with our new loss functions. Our approach does not require negative data or knowledge of task boundaries and performs well with smaller batch sizes while being computationally efficient. We demonstrate with extensive experiments that the proposed method outperforms the current state-of-the-art methods on both standard evaluation scenarios and realistic scenarios with blurry task boundaries. For reproducibility, we use the same training pipeline for every compared method and share the code at https://github.com/Nicolas1203/ocl-fd.
NAMar 20, 2014
Epigraphical splitting for solving constrained convex formulations of inverse problems with proximal toolsGiovanni Chierchia, Nelly Pustelnik, Jean-Christophe Pesquet et al.
We propose a proximal approach to deal with a class of convex variational problems involving nonlinear constraints. A large family of constraints, proven to be effective in the solution of inverse problems, can be expressed as the lower level set of a sum of convex functions evaluated over different, but possibly overlapping, blocks of the signal. For such constraints, the associated projection operator generally does not have a simple form. We circumvent this difficulty by splitting the lower level set into as many epigraphs as functions involved in the sum. A closed half-space constraint is also enforced, in order to limit the sum of the introduced epigraphical variables to the upper bound of the original lower level set. In this paper, we focus on a family of constraints involving linear transforms of distance functions to a convex set or $\ell_{1,p}$ norms with $p\in \{1,2,\infty\}$. In these cases, the projection onto the epigraph of the involved function has a closed form expression. The proposed approach is validated in the context of image restoration with missing samples, by making use of constraints based on Non-Local Total Variation. Experiments show that our method leads to significant improvements in term of convergence speed over existing algorithms for solving similar constrained problems. A second application to a pulse shape design problem is provided in order to illustrate the flexibility of the proposed approach.
LGOct 2, 2023
SWMLP: Shared Weight Multilayer Perceptron for Car Trajectory Speed Prediction using Road Topographical FeaturesSarah Almeida Carneiro, Giovanni Chierchia, Jean Charléty et al.
Although traffic is one of the massively collected data, it is often only available for specific regions. One concern is that, although there are studies that give good results for these data, the data from these regions may not be sufficiently representative to describe all the traffic patterns in the rest of the world. In quest of addressing this concern, we propose a speed prediction method that is independent of large historical speed data. To predict a vehicle's speed, we use the trajectory road topographical features to fit a Shared Weight Multilayer Perceptron learning model. Our results show significant improvement, both qualitative and quantitative, over standard regression analysis. Moreover, the proposed framework sheds new light on the way to design new approaches for traffic analysis.
LGJul 12, 2022
Contrastive Learning for Online Semi-Supervised General Continual LearningNicolas Michel, Romain Negrel, Giovanni Chierchia et al.
We study Online Continual Learning with missing labels and propose SemiCon, a new contrastive loss designed for partly labeled data. We demonstrate its efficiency by devising a memory-based method trained on an unlabeled data stream, where every data added to memory is labeled using an oracle. Our approach outperforms existing semi-supervised methods when few labels are available, and obtain similar results to state-of-the-art supervised methods while using only 2.6% of labels on Split-CIFAR10 and 10% of labels on Split-CIFAR100.
LGSep 13, 2023
Domain-Aware Augmentations for Unsupervised Online General Continual LearningNicolas Michel, Romain Negrel, Giovanni Chierchia et al.
Continual Learning has been challenging, especially when dealing with unsupervised scenarios such as Unsupervised Online General Continual Learning (UOGCL), where the learning agent has no prior knowledge of class boundaries or task change information. While previous research has focused on reducing forgetting in supervised setups, recent studies have shown that self-supervised learners are more resilient to forgetting. This paper proposes a novel approach that enhances memory usage for contrastive learning in UOGCL by defining and using stream-dependent data augmentations together with some implementation tricks. Our proposed method is simple yet effective, achieves state-of-the-art results compared to other unsupervised approaches in all considered setups, and reduces the gap between supervised and unsupervised continual learning. Our domain-aware augmentation procedure can be adapted to other replay-based methods, making it a promising strategy for continual learning.
LGDec 15, 2025
Network-Wide Traffic Volume Estimation from Speed Profiles using a Spatio-Temporal Graph Neural Network with Directed Spatial AttentionLéo Hein, Giovanni de Nunzio, Giovanni Chierchia et al.
Existing traffic volume estimation methods typically address either forecasting traffic on sensor-equipped roads or spatially imputing missing volumes using nearby sensors. While forecasting models generally disregard unmonitored roads by design, spatial imputation methods explicitly address network-wide estimation; yet this approach relies on volume data at inference time, limiting its applicability in sensor-scarce cities. Unlike traffic volume data, probe vehicle speeds and static road attributes are more broadly accessible and support full coverage of road segments in most urban networks. In this work, we present the Hybrid Directed-Attention Spatio-Temporal Graph Neural Network (HDA-STGNN), an inductive deep learning framework designed to tackle the network-wide volume estimation problem. Our approach leverages speed profiles, static road attributes, and road network topology to predict daily traffic volume profiles across all road segments in the network. To evaluate the effectiveness of our approach, we perform extensive ablation studies that demonstrate the model's capacity to capture complex spatio-temporal dependencies and highlight the value of topological information for accurate network-wide traffic volume estimation without relying on volume data at inference time.
LGMay 25, 2019Code
Ultrametric Fitting by Gradient DescentGiovanni Chierchia, Benjamin Perret
We study the problem of fitting an ultrametric distance to a dissimilarity graph in the context of hierarchical cluster analysis. Standard hierarchical clustering methods are specified procedurally, rather than in terms of the cost function to be optimized. We aim to overcome this limitation by presenting a general optimization framework for ultrametric fitting. Our approach consists of modeling the latter as a constrained optimization problem over the continuous space of ultrametrics. So doing, we can leverage the simple, yet effective, idea of replacing the ultrametric constraint with a min-max operation injected directly into the cost function. The proposed reformulation leads to an unconstrained optimization problem that can be efficiently solved by gradient descent methods. The flexibility of our framework allows us to investigate several cost functions, following the classic paradigm of combining a data fidelity term with a regularization. While we provide no theoretical guarantee to find the global optimum, the numerical results obtained over a number of synthetic and real datasets demonstrate the good performance of our approach with respect to state-of-the-art agglomerative algorithms. This makes us believe that the proposed framework sheds new light on the way to design a new generation of hierarchical clustering methods. Our code is made publicly available at \url{https://github.com/PerretB/ultrametric-fitting}.
AIFeb 12, 2024
Clustering Dynamics for Improved Speed Prediction Deriving from Topographical GPS RegistrationsSarah Almeida Carneiro, Giovanni Chierchia, Aurelie Pirayre et al.
A persistent challenge in the field of Intelligent Transportation Systems is to extract accurate traffic insights from geographic regions with scarce or no data coverage. To this end, we propose solutions for speed prediction using sparse GPS data points and their associated topographical and road design features. Our goal is to investigate whether we can use similarities in the terrain and infrastructure to train a machine learning model that can predict speed in regions where we lack transportation data. For this we create a Temporally Orientated Speed Dictionary Centered on Topographically Clustered Roads, which helps us to provide speed correlations to selected feature configurations. Our results show qualitative and quantitative improvement over new and standard regression methods. The presented framework provides a fresh perspective on devising strategies for missing data traffic analysis.
LGSep 1, 2023
New metrics for analyzing continual learnersNicolas Michel, Giovanni Chierchia, Romain Negrel et al.
Deep neural networks have shown remarkable performance when trained on independent and identically distributed data from a fixed set of classes. However, in real-world scenarios, it can be desirable to train models on a continuous stream of data where multiple classification tasks are presented sequentially. This scenario, known as Continual Learning (CL) poses challenges to standard learning algorithms which struggle to maintain knowledge of old tasks while learning new ones. This stability-plasticity dilemma remains central to CL and multiple metrics have been proposed to adequately measure stability and plasticity separately. However, none considers the increasing difficulty of the classification task, which inherently results in performance loss for any model. In that sense, we analyze some limitations of current metrics and identify the presence of setup-induced forgetting. Therefore, we propose new metrics that account for the task's increasing difficulty. Through experiments on benchmark datasets, we demonstrate that our proposed metrics can provide new insights into the stability-plasticity trade-off achieved by models in the continual learning environment.
LGSep 9, 2021
FGOT: Graph Distances based on Filters and Optimal TransportHermina Petric Maretic, Mireille El Gheche, Giovanni Chierchia et al.
Graph comparison deals with identifying similarities and dissimilarities between graphs. A major obstacle is the unknown alignment of graphs, as well as the lack of accurate and inexpensive comparison metrics. In this work we introduce the filter graph distance. It is an optimal transport based distance which drives graph comparison through the probability distribution of filtered graph signals. This creates a highly flexible distance, capable of prioritising different spectral information in observed graphs, offering a wide range of choices for a comparison metric. We tackle the problem of graph alignment by computing graph permutations that minimise our new filter distances, which implicitly solves the graph comparison problem. We then propose a new approximate cost function that circumvents many computational difficulties inherent to graph comparison and permits the exploitation of fast algorithms such as mirror gradient descent, without grossly sacrificing the performance. We finally propose a novel algorithm derived from a stochastic version of mirror gradient descent, which accommodates the non-convexity of the alignment problem, offering a good trade-off between performance accuracy and speed. The experiments on graph alignment and classification show that the flexibility gained through filter graph distances can have a significant impact on performance, while the difference in speed offered by the approximation cost makes the framework applicable in practical settings.
CPNov 9, 2020
SuperDeConFuse: A Supervised Deep Convolutional Transform based Fusion Framework for Financial Trading SystemsPooja Gupta, Angshul Majumdar, Emilie Chouzenoux et al.
This work proposes a supervised multi-channel time-series learning framework for financial stock trading. Although many deep learning models have recently been proposed in this domain, most of them treat the stock trading time-series data as 2-D image data, whereas its true nature is 1-D time-series data. Since the stock trading systems are multi-channel data, many existing techniques treating them as 1-D time-series data are not suggestive of any technique to effectively fusion the information carried by the multiple channels. To contribute towards both of these shortcomings, we propose an end-to-end supervised learning framework inspired by the previously established (unsupervised) convolution transform learning framework. Our approach consists of processing the data channels through separate 1-D convolution layers, then fusing the outputs with a series of fully-connected layers, and finally applying a softmax classification layer. The peculiarity of our framework - SuperDeConFuse (SDCF), is that we remove the nonlinear activation located between the multi-channel convolution layers and the fully-connected layers, as well as the one located between the latter and the output layer. We compensate for this removal by introducing a suitable regularization on the aforementioned layer outputs and filters during the training phase. Specifically, we apply a logarithm determinant regularization on the layer filters to break symmetry and force diversity in the learnt transforms, whereas we enforce the non-negativity constraint on the layer outputs to mitigate the issue of dead neurons. This results in the effective learning of a richer set of features and filters with respect to a standard convolutional neural network. Numerical experiments confirm that the proposed model yields considerably better results than state-of-the-art deep learning techniques for real-world problem of stock trading.
LGNov 9, 2020
DeConFuse : A Deep Convolutional Transform based Unsupervised Fusion FrameworkPooja Gupta, Jyoti Maggu, Angshul Majumdar et al.
This work proposes an unsupervised fusion framework based on deep convolutional transform learning. The great learning ability of convolutional filters for data analysis is well acknowledged. The success of convolutive features owes to convolutional neural network (CNN). However, CNN cannot perform learning tasks in an unsupervised fashion. In a recent work, we show that such shortcoming can be addressed by adopting a convolutional transform learning (CTL) approach, where convolutional filters are learnt in an unsupervised fashion. The present paper aims at (i) proposing a deep version of CTL; (ii) proposing an unsupervised fusion formulation taking advantage of the proposed deep CTL representation; (iii) developing a mathematically sounded optimization strategy for performing the learning task. We apply the proposed technique, named DeConFuse, on the problem of stock forecasting and trading. Comparison with state-of-the-art methods (based on CNN and long short-term memory network) shows the superiority of our method for performing a reliable feature extraction.
LGNov 9, 2020
ConFuse: Convolutional Transform Learning Fusion Framework For Multi-Channel Data AnalysisPooja Gupta, Jyoti Maggu, Angshul Majumdar et al.
This work addresses the problem of analyzing multi-channel time series data %. In this paper, we by proposing an unsupervised fusion framework based on %the recently proposed convolutional transform learning. Each channel is processed by a separate 1D convolutional transform; the output of all the channels are fused by a fully connected layer of transform learning. The training procedure takes advantage of the proximal interpretation of activation functions. We apply the developed framework to multi-channel financial data for stock forecasting and trading. We compare our proposed formulation with benchmark deep time series analysis networks. The results show that our method yields considerably better results than those compared against.
LGOct 2, 2020
Deep Convolutional Transform Learning -- Extended versionJyoti Maggu, Angshul Majumdar, Emilie Chouzenoux et al.
This work introduces a new unsupervised representation learning technique called Deep Convolutional Transform Learning (DCTL). By stacking convolutional transforms, our approach is able to learn a set of independent kernels at different layers. The features extracted in an unsupervised manner can then be used to perform machine learning tasks, such as classification and clustering. The learning technique relies on a well-sounded alternating proximal minimization scheme with established convergence guarantees. Our experimental results show that the proposed DCTL technique outperforms its shallow version CTL, on several benchmark datasets.
LGMar 12, 2020
Wasserstein-based Graph AlignmentHermina Petric Maretic, Mireille El Gheche, Matthias Minder et al.
We propose a novel method for comparing non-aligned graphs of different sizes, based on the Wasserstein distance between graph signal distributions induced by the respective graph Laplacian matrices. Specifically, we cast a new formulation for the one-to-many graph alignment problem, which aims at matching a node in the smaller graph with one or more nodes in the larger graph. By integrating optimal transport in our graph comparison framework, we generate both a structurally-meaningful graph distance, and a signal transportation plan that models the structure of graph data. The resulting alignment problem is solved with stochastic gradient descent, where we use a novel Dykstra operator to ensure that the solution is a one-to-many (soft) assignment matrix. We demonstrate the performance of our novel framework on graph alignment and graph classification, and we show that our method leads to significant improvements with respect to the state-of-the-art algorithms for each of these tasks.
LGJun 5, 2019
GOT: An Optimal Transport framework for Graph comparisonHermina Petric Maretic, Mireille EL Gheche, Giovanni Chierchia et al.
We present a novel framework based on optimal transport for the challenging problem of comparing graphs. Specifically, we exploit the probabilistic distribution of smooth graph signals defined with respect to the graph topology. This allows us to derive an explicit expression of the Wasserstein distance between graph signal distributions in terms of the graph Laplacian matrices. This leads to a structurally meaningful measure for comparing graphs, which is able to take into account the global structure of graphs, while most other measures merely observe local changes independently. Our measure is then used for formulating a new graph alignment problem, whose objective is to estimate the permutation that minimizes the distance between two graphs. We further propose an efficient stochastic algorithm based on Bayesian exploration to accommodate for the non-convexity of the graph alignment problem. We finally demonstrate the performance of our novel framework on different tasks like graph alignment, graph classification and graph signal prediction, and we show that our method leads to significant improvement with respect to the-state-of-art algorithms.
LGDec 13, 2018
Stochastic Gradient Descent for Spectral Embedding with Implicit Orthogonality ConstraintMireille El Gheche, Giovanni Chierchia, Pascal Frossard
In this paper, we propose a scalable algorithm for spectral embedding. The latter is a standard tool for graph clustering. However, its computational bottleneck is the eigendecomposition of the graph Laplacian matrix, which prevents its application to large-scale graphs. Our contribution consists of reformulating spectral embedding so that it can be solved via stochastic optimization. The idea is to replace the orthogonality constraint with an orthogonalization matrix injected directly into the criterion. As the gradient can be computed through a Cholesky factorization, our reformulation allows us to develop an efficient algorithm based on mini-batch gradient descent. Experimental results, both on synthetic and real data, confirm the efficiency of the proposed method in term of execution speed with respect to similar existing techniques.
LGNov 2, 2018
OrthoNet: Multilayer Network Data ClusteringMireille El Gheche, Giovanni Chierchia, Pascal Frossard
Network data appears in very diverse applications, like biological, social, or sensor networks. Clustering of network nodes into categories or communities has thus become a very common task in machine learning and data mining. Network data comes with some information about the network edges. In some cases, this network information can even be given with multiple views or multiple layers, each one representing a different type of relationship between the network nodes. Increasingly often, network nodes also carry a feature vector. We propose in this paper to extend the node clustering problem, that commonly considers only the network information, to a problem where both the network information and the node features are considered together for learning a clustering-friendly representation of the feature space. Specifically, we design a generic two-step algorithm for multilayer network data clustering. The first step aggregates the different layers of network information into a graph representation given by the geometric mean of the network Laplacian matrices. The second step uses a neural net to learn a feature embedding that is consistent with the structure given by the network layers. We propose a novel algorithm for efficiently training the neural net via stochastic gradient descent, which encourages the neural net outputs to span the leading eigenvectors of the aggregated Laplacian matrix, in order to capture the pairwise interactions on the network, and provide a clustering-friendly representation of the feature space. We demonstrate with an extensive set of experiments on synthetic and real datasets that our method leads to a significant improvement w.r.t. state-of-the-art multilayer graph clustering algorithms, as it judiciously combines nodes features and network information in the node embedding algorithms.
CVMay 23, 2018
A Two-Stage Subspace Trust Region Approach for Deep Neural Network TrainingViacheslav Dudar, Giovanni Chierchia, Emilie Chouzenoux et al.
In this paper, we develop a novel second-order method for training feed-forward neural nets. At each iteration, we construct a quadratic approximation to the cost function in a low-dimensional subspace. We minimize this approximation inside a trust region through a two-stage procedure: first inside the embedded positive curvature subspace, followed by a gradient descent step. This approach leads to a fast objective function decay, prevents convergence to saddle points, and alleviates the need for manually tuning parameters. We show the good performance of the proposed algorithm on benchmark datasets.
OCDec 25, 2017
A Random Block-Coordinate Douglas-Rachford Splitting Method with Low Computational Complexity for Binary Logistic RegressionLuis M. Briceno-Arias, Giovanni Chierchia, Emilie Chouzenoux et al.
In this paper, we propose a new optimization algorithm for sparse logistic regression based on a stochastic version of the Douglas-Rachford splitting method. Our algorithm sweeps the training set by randomly selecting a mini-batch of data at each iteration, and it allows us to update the variables in a block coordinate manner. Our approach leverages the proximity operator of the logistic loss, which is expressed with the generalized Lambert W function. Experiments carried out on standard datasets demonstrate the efficiency of our approach w.r.t. stochastic gradient-like methods.
CVMar 21, 2014
A Non-Local Structure Tensor Based Approach for Multicomponent Image Recovery ProblemsGiovanni Chierchia, Nelly Pustelnik, Beatrice Pesquet-Popescu et al.
Non-Local Total Variation (NLTV) has emerged as a useful tool in variational methods for image recovery problems. In this paper, we extend the NLTV-based regularization to multicomponent images by taking advantage of the Structure Tensor (ST) resulting from the gradient of a multicomponent image. The proposed approach allows us to penalize the non-local variations, jointly for the different components, through various $\ell_{1,p}$ matrix norms with $p \ge 1$. To facilitate the choice of the hyper-parameters, we adopt a constrained convex optimization approach in which we minimize the data fidelity term subject to a constraint involving the ST-NLTV regularization. The resulting convex optimization problem is solved with a novel epigraphical projection method. This formulation can be efficiently implemented thanks to the flexibility offered by recent primal-dual proximal algorithms. Experiments are carried out for multispectral and hyperspectral images. The results demonstrate the interest of introducing a non-local structure tensor regularization and show that the proposed approach leads to significant improvements in terms of convergence speed over current state-of-the-art methods.