Talel Abdessalem

LG
5papers
708citations
Novelty31%
AI Score26

5 Papers

LGDec 8, 2020Code
River: machine learning for streaming data in Python

Jacob Montiel, Max Halford, Saulo Martiello Mastelini et al.

River is a machine learning library for dynamic data streams and continual learning. It provides multiple state-of-the-art learning methods, data generators/transformers, performance metrics and evaluators for different stream learning problems. It is the result from the merger of the two most popular packages for stream learning in Python: Creme and scikit-multiflow. River introduces a revamped architecture based on the lessons learnt from the seminal packages. River's ambition is to be the go-to library for doing machine learning on streaming data. Additionally, this open source package brings under the same umbrella a large community of practitioners and researchers. The source code is available at https://github.com/online-ml/river.

LGJul 12, 2018Code
Scikit-Multiflow: A Multi-output Streaming Framework

Jacob Montiel, Jesse Read, Albert Bifet et al.

Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of the art methods for stream learning, stream generators and evaluators. scikit-multiflow builds upon popular open source frameworks including scikit-learn, MOA and MEKA. Development follows the FOSS principles and quality is enforced by complying with PEP8 guidelines and using continuous integration and automatic testing. The source code is publicly available at https://github.com/scikit-multiflow/scikit-multiflow.

LGMay 15, 2020
Adaptive XGBoost for Evolving Data Streams

Jacob Montiel, Rory Mitchell, Eibe Frank et al.

Boosting is an ensemble method that combines base models in a sequential manner to achieve high predictive accuracy. A popular learning algorithm based on this ensemble method is eXtreme Gradient Boosting (XGB). We present an adaptation of XGB for classification of evolving data streams. In this setting, new data arrives over time and the relationship between the class and the features may change in the process, thus exhibiting concept drift. The proposed method creates new members of the ensemble from mini-batches of data as new data becomes available. The maximum ensemble size is fixed, but learning does not stop when this size is reached because the ensemble is updated on new data to ensure consistency with the current concept. We also explore the use of concept drift detection to trigger a mechanism to update the ensemble. We test our method on real and synthetic data with concept drift and compare it against batch-incremental and instance-incremental classification methods for data streams.

LGMar 31, 2016
Pessimistic Uplift Modeling

Atef Shaar, Talel Abdessalem, Olivier Segard

Uplift modeling is a machine learning technique that aims to model treatment effects heterogeneity. It has been used in business and health sectors to predict the effect of a specific action on a given individual. Despite its advantages, uplift models show high sensitivity to noise and disturbance, which leads to unreliable results. In this paper we show different approaches to address the problem of uplift modeling, we demonstrate how disturbance in data can affect uplift measurement. We propose a new approach, we call it Pessimistic Uplift Modeling, that minimizes disturbance effects. We compared our approach with the existing uplift methods, on simulated and real data-sets. The experiments show that our approach outperforms the existing approaches, especially in the case of high noise data environment.

IRSep 29, 2013
Improving tag recommendation by folding in more consistency

Modou Gueye, Talel Abdessalem, Hubert Naacke

Tag recommendation is a major aspect of collaborative tagging systems. It aims to recommend tags to a user for tagging an item. In this paper we present a part of our work in progress which is a novel improvement of recommendations by re-ranking the output of a tag recommender. We mine association rules between candidates tags in order to determine a more consistent list of tags to recommend. Our method is an add-on one which leads to better recommendations as we show in this paper. It is easily parallelizable and morever it may be applied to a lot of tag recommenders. The experiments we did on five datasets with two kinds of tag recommender demonstrated the efficiency of our method.