Leandro L. Minku

LG
10papers
517citations
Novelty37%
AI Score27

10 Papers

LGAug 28, 2023
SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams

Chun Wai Chiu, Leandro L. Minku

Many real-world data stream applications not only suffer from concept drift but also class imbalance. Yet, very few existing studies investigated this joint challenge. Data difficulty factors, which have been shown to be key challenges in class imbalanced data streams, are not taken into account by existing approaches when learning class imbalanced data streams. In this work, we propose a drift adaptable oversampling strategy to synthesise minority class examples based on stream clustering. The motivation is that stream clustering methods continuously update themselves to reflect the characteristics of the current underlying concept, including data difficulty factors. This nature can potentially be used to compress past information without caching data in the memory explicitly. Based on the compressed information, synthetic examples can be created within the region that recently generated new minority class examples. Experiments with artificial and real-world data streams show that the proposed approach can handle concept drift involving different minority class decomposition better than existing approaches, especially when the data stream is severely class imbalanced and presenting high proportions of safe and borderline minority class examples.

LGAug 5, 2024
Cross-Modality Clustering-based Self-Labeling for Multimodal Data Classification

Paweł Zyblewski, Leandro L. Minku

Technological advances facilitate the ability to acquire multimodal data, posing a challenge for recognition systems while also providing an opportunity to use the heterogeneous nature of the information to increase the generalization capability of models. An often overlooked issue is the cost of the labeling process, which is typically high due to the need for a significant investment in time and money associated with human experts. Existing semi-supervised learning methods often focus on operating in the feature space created by the fusion of available modalities, neglecting the potential for cross-utilizing complementary information available in each modality. To address this problem, we propose Cross-Modality Clustering-based Self-Labeling (CMCSL). Based on a small set of pre-labeled data, CMCSL groups instances belonging to each modality in the deep feature space and then propagates known labels within the resulting clusters. Next, information about the instances' class membership in each modality is exchanged based on the Euclidean distance to ensure more accurate labeling. Experimental evaluation conducted on 20 datasets derived from the MM-IMDb dataset indicates that cross-propagation of labels between modalities -- especially when the number of pre-labeled instances is small -- can allow for more reliable labeling and thus increase the classification performance in each modality.

NEJan 12, 2022
Evolutionary Optimization for Proactive and Dynamic Computing Resource Allocation in Open Radio Access Network

Gan Ruan, Leandro L. Minku, Zhao Xu et al.

Intelligent techniques are urged to achieve automatic allocation of the computing resource in Open Radio Access Network (O-RAN), to save computing resource, increase utilization rate of them and decrease the delay. However, the existing problem formulation to solve this resource allocation problem is unsuitable as it defines the capacity utility of resource in an inappropriate way and tends to cause much delay. Moreover, the existing problem has only been attempted to be solved based on greedy search, which is not ideal as it could get stuck into local optima. Considering those, a new formulation that better describes the problem is proposed. In addition, as a well-known global search meta heuristic approach, an evolutionary algorithm (EA) is designed tailored for solving the new problem formulation, to find a resource allocation scheme to proactively and dynamically deploy the computing resource for processing upcoming traffic data. Experimental studies carried out on several real-world datasets and newly generated artificial datasets with more properties beyond the real-world datasets have demonstrated the significant superiority over a baseline greedy algorithm under different parameter settings. Moreover, experimental studies are taken to compare the proposed EA and two variants, to indicate the impact of different algorithm choices.

NEApr 14, 2021
A Novel Generalised Meta-Heuristic Framework for Dynamic Capacitated Arc Routing Problems

Hao Tong, Leandro L. Minku, Stefan Menzel et al.

The capacitated arc routing problem (CARP) is a challenging combinatorial optimisation problem abstracted from many real-world applications, such as waste collection, road gritting and mail delivery. However, few studies considered dynamic changes during the vehicles' service, which can cause the original schedule infeasible or obsolete. The few existing studies are limited by the dynamic scenarios considered, and by overly complicated algorithms that are unable to benefit from the wealth of contributions provided by the existing CARP literature. In this paper, we first provide a mathematical formulation of dynamic CARP (DCARP) and design a simulation system that is able to consider dynamic events while a routing solution is already partially executed. We then propose a novel framework which can benefit from existing static CARP optimisation algorithms so that they could be used to handle DCARP instances. The framework is very flexible. In response to a dynamic event, it can use either a simple restart strategy or a sequence transfer strategy that benefits from past optimisation experience. Empirical studies have been conducted on a wide range of DCARP instances to evaluate our proposed framework. The results show that the proposed framework significantly improves over state-of-the-art dynamic optimisation algorithms.

NENov 19, 2020
Metaheuristics "In the Large"

Jerry Swan, Steven Adriaensen, Alexander E. I. Brownlee et al.

Following decades of sustained improvement, metaheuristics are one of the great success stories of optimization research. However, in order for research in metaheuristics to avoid fragmentation and a lack of reproducibility, there is a pressing need for stronger scientific and computational infrastructure to support the development, analysis and comparison of new approaches. We argue that, via principled choice of infrastructure support, the field can pursue a higher level of scientific enquiry. We describe our vision and report on progress, showing how the adoption of common protocols for all metaheuristics can help liberate the potential of the field, easing the exploration of the design space of metaheuristics.

LGJan 7, 2019
Multi-Source Transfer Learning for Non-Stationary Environments

Honghui Du, Leandro L. Minku, Huiyu Zhou

In data stream mining, predictive models typically suffer drops in predictive performance due to concept drift. As enough data representing the new concept must be collected for the new concept to be well learnt, the predictive performance of existing models usually takes some time to recover from concept drift. To speed up recovery from concept drift and improve predictive performance in data stream mining, this work proposes a novel approach called Multi-sourcE onLine TrAnsfer learning for Non-statIonary Environments (Melanie). Melanie is the first approach able to transfer knowledge between multiple data streaming sources in non-stationary environments. It creates several sub-classifiers to learn different aspects from different source and target concepts over time. The sub-classifiers that match the current target concept well are identified, and used to compose an ensemble for predicting examples from the target concept. We evaluate Melanie on several synthetic data streams containing different types of concept drift and on real world data streams. The results indicate that Melanie can deal with a variety drifts and improve predictive performance over existing data stream learning algorithms by making use of multiple sources.

SEDec 4, 2018
Better Software Analytics via "DUO": Data Mining Algorithms Using/Used-by Optimizers

Amritanshu Agrawal, Tim Menzies, Leandro L. Minku et al.

This paper claims that a new field of empirical software engineering research and practice is emerging: data mining using/used-by optimizers for empirical studies or DUO. For example, data miners can generate models that are explored by optimizers. Also, optimizers can advise how to best adjust the control parameters of a data miner. This combined approach acts like an agent leaning over the shoulder of an analyst that advises "ask this question next" or "ignore that problem, it is not relevant to your goals". Further, those agents can help us build "better" predictive models, where "better" can be either greater predictive accuracy or faster modeling time (which, in turn, enables the exploration of a wider range of options). We also caution that the era of papers that just use data miners is coming to an end. Results obtained from an unoptimized data miner can be quickly refuted, just by applying an optimizer to produce a different (and better performing) model. Our conclusion, hence, is that for software analytics it is possible, useful and necessary to combine data mining and optimization using DUO.

LGJul 28, 2017
Proceedings of the IJCAI 2017 Workshop on Learning in the Presence of Class Imbalance and Concept Drift (LPCICD'17)

Shuo Wang, Leandro L. Minku, Nitesh Chawla et al.

With the wide application of machine learning algorithms to the real world, class imbalance and concept drift have become crucial learning issues. Class imbalance happens when the data categories are not equally represented, i.e., at least one category is minority compared to other categories. It can cause learning bias towards the majority class and poor generalization. Concept drift is a change in the underlying distribution of the problem, and is a significant issue specially when learning from data streams. It requires learners to be adaptive to dynamic changes. Class imbalance and concept drift can significantly hinder predictive performance, and the problem becomes particularly challenging when they occur simultaneously. This challenge arises from the fact that one problem can affect the treatment of the other. For example, drift detection algorithms based on the traditional classification error may be sensitive to the imbalanced degree and become less effective; and class imbalance techniques need to be adaptive to changing imbalance rates, otherwise the class receiving the preferential treatment may not be the correct minority class at the current moment. Therefore, the mutual effect of class imbalance and concept drift should be considered during algorithm design. The aim of this workshop is to bring together researchers from the areas of class imbalance learning and concept drift in order to encourage discussions and new collaborations on solving the combined issue of class imbalance and concept drift. It provides a forum for international researchers and practitioners to share and discuss their original work on addressing new challenges and research issues in class imbalance learning, concept drift, and the combined issues of class imbalance and concept drift. The proceedings include 8 papers on these topics.

LGMar 20, 2017
A Systematic Study of Online Class Imbalance Learning with Concept Drift

Shuo Wang, Leandro L. Minku, Xin Yao

As an emerging research topic, online class imbalance learning often combines the challenges of both class imbalance and concept drift. It deals with data streams having very skewed class distributions, where concept drift may occur. It has recently received increased research attention; however, very little work addresses the combined problem where both class imbalance and concept drift coexist. As the first systematic study of handling concept drift in class-imbalanced data streams, this paper first provides a comprehensive review of current research progress in this field, including current research focuses and open challenges. Then, an in-depth experimental study is performed, with the goal of understanding how to best overcome concept drift in online learning with class imbalance. Based on the analysis, a general guideline is proposed for the development of an effective algorithm.

SESep 5, 2014
The Handbook of Engineering Self-Aware and Self-Expressive Systems

Tao Chen, Funmilade Faniyi, Rami Bahsoon et al.

When faced with the task of designing and implementing a new self-aware and self-expressive computing system, researchers and practitioners need a set of guidelines on how to use the concepts and foundations developed in the Engineering Proprioception in Computing Systems (EPiCS) project. This report provides such guidelines on how to design self-aware and self-expressive computing systems in a principled way. We have documented different categories of self-awareness and self-expression level using architectural patterns. We have also documented common architectural primitives, their possible candidate techniques and attributes for architecting self-aware and self-expressive systems. Drawing on the knowledge obtained from the previous investigations, we proposed a pattern driven methodology for engineering self-aware and self-expressive systems to assist in utilising the patterns and primitives during design. The methodology contains detailed guidance to make decisions with respect to the possible design alternatives, providing a systematic way to build self-aware and self-expressive systems. Then, we qualitatively and quantitatively evaluated the methodology using two case studies. The results reveal that our pattern driven methodology covers the main aspects of engineering self-aware and self-expressive systems, and that the resulted systems perform significantly better than the non-self-aware systems.