Tai Le Quy

h-index8

7papers

410citations

Novelty29%

AI Score27

Ranked #153,988 of 194,257 authors (top 79%)#33,842 in LG (top 84%)

7 Papers

3.8LGJan 9, 2023

A review of clustering models in educational data science towards fairness-aware learning

Tai Le Quy, Gunnar Friege, Eirini Ntoutsi

Ensuring fairness is essential for every education system. Machine learning is increasingly supporting the education system and educational data science (EDS) domain, from decision support to educational activities and learning analytics. However, the machine learning-based decisions can be biased because the algorithms may generate the results based on students' protected attributes such as race or gender. Clustering is an important machine learning technique to explore student data in order to support the decision-maker, as well as support educational activities, such as group assignments. Therefore, ensuring high-quality clustering models along with satisfying fairness constraints are important requirements. This chapter comprehensively surveys clustering models and their fairness in EDS. We especially focus on investigating the fair clustering models applied in educational activities. These models are believed to be practical tools for analyzing students' data and ensuring fairness in EDS.

6.9LGAug 22, 2022

Evaluation of group fairness measures in student performance prediction problems

Tai Le Quy, Thi Huyen Nguyen, Gunnar Friege et al.

Predicting students' academic performance is one of the key tasks of educational data mining (EDM). Traditionally, the high forecasting quality of such models was deemed critical. More recently, the issues of fairness and discrimination w.r.t. protected attributes, such as gender or race, have gained attention. Although there are several fairness-aware learning approaches in EDM, a comparative evaluation of these measures is still missing. In this paper, we evaluate different group fairness measures for student performance prediction problems on various educational datasets and fairness-aware learning models. Our study shows that the choice of the fairness measure is important, likewise for the choice of the grade threshold.

1.8LGJun 20, 2022

Multiple Fairness and Cardinality constraints for Students-Topics Grouping Problem

Tai Le Quy, Gunnar Friege, Eirini Ntoutsi

Group work is a prevalent activity in educational settings, where students are often divided into topic-specific groups based on their preferences. The grouping should reflect the students' aspirations as much as possible. Usually, the resulting groups should also be balanced in terms of protected attributes like gender or race since studies indicate that students might learn better in a diverse group. Moreover, balancing the group cardinalities is also an essential requirement for fair workload distribution across the groups. In this paper, we introduce the multi-fair capacitated (MFC) grouping problem that fairly partitions students into non-overlapping groups while ensuring balanced group cardinalities (with a lower bound and an upper bound), and maximizing the diversity of members in terms of protected attributes. We propose two approaches: a heuristic method and a knapsack-based method to obtain the MFC grouping. The experiments on a real dataset and a semi-synthetic dataset show that our proposed methods can satisfy students' preferences well and deliver balanced and diverse groups regarding cardinality and the protected attribute, respectively.

4.1LGMar 2, 2025

FACROC: a fairness measure for FAir Clustering through ROC curves

Tai Le Quy, Long Le Thanh, Lan Luong Thi Hong et al.

Fair clustering has attracted remarkable attention from the research community. Many fairness measures for clustering have been proposed; however, they do not take into account the clustering quality w.r.t. the values of the protected attribute. In this paper, we introduce a new visual-based fairness measure for fair clustering through ROC curves, namely FACROC. This fairness measure employs AUCC as a measure of clustering quality and then computes the difference in the corresponding ROC curves for each value of the protected attribute. Experimental results on several popular datasets for fairness-aware machine learning and well-known (fair) clustering models show that FACROC is a beneficial method for visually evaluating the fairness of clustering models.

33.9LGOct 1, 2021Code

A survey on datasets for fairness-aware machine learning

Tai Le Quy, Arjun Roy, Vasileios Iosifidis et al.

As decision-making increasingly relies on Machine Learning (ML) and (big) data, the issue of fairness in data-driven Artificial Intelligence (AI) systems is receiving increasing attention from both research and industry. A large variety of fairness-aware machine learning solutions have been proposed which involve fairness-related interventions in the data, learning algorithms and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware machine learning. We focus on tabular data as the most common data representation for fairness-aware machine learning. We start our analysis by identifying relationships between the different attributes, particularly w.r.t. protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate the interesting relationships using exploratory analysis.

8.4LGApr 25, 2021

Fair-Capacitated Clustering

Tai Le Quy, Arjun Roy, Gunnar Friege et al.

Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.

3.3SPMar 30, 2021

Data augmentation for dealing with low sampling rates in NILM

Tai Le Quy, Sergej Zerr, Eirini Ntoutsi et al.

Data have an important role in evaluating the performance of NILM algorithms. The best performance of NILM algorithms is achieved with high-quality evaluation data. However, many existing real-world data sets come with a low sampling quality, and often with gaps, lacking data for some recording periods. As a result, in such data, NILM algorithms can hardly recognize devices and estimate their power consumption properly. An important step towards improving the performance of these energy disaggregation methods is to improve the quality of the data sets. In this paper, we carry out experiments using several methods to increase the sampling rate of low sampling rate data. Our results show that augmentation of low-frequency data can support the considered NILM algorithms in estimating appliances' consumption with a higher F-score measurement.