Michael Aupetit

HC
8papers
28citations
Novelty31%
AI Score19

8 Papers

LGSep 20, 2022
Sanity Check for External Clustering Validation Benchmarks using Internal Validation Measures

Hyeon Jeon, Michael Aupetit, DongHwa Shin et al.

We address the lack of reliability in benchmarking clustering techniques based on labeled datasets. A standard scheme in external clustering validation is to use class labels as ground truth clusters, based on the assumption that each class forms a single, clearly separated cluster. However, as such cluster-label matching (CLM) assumption often breaks, the lack of conducting a sanity check for the CLM of benchmark datasets casts doubt on the validity of external validations. Still, evaluating the degree of CLM is challenging. For example, internal clustering validation measures can be used to quantify CLM within the same dataset to evaluate its different clusterings but are not designed to compare clusterings of different datasets. In this work, we propose a principled way to generate between-dataset internal measures that enable the comparison of CLM across datasets. We first determine four axioms for between-dataset internal measures, complementing Ackerman and Ben-David's within-dataset axioms. We then propose processes to generalize internal measures to fulfill these new axioms, and use them to extend the widely used Calinski-Harabasz index for between-dataset CLM evaluation. Through quantitative experiments, we (1) verify the validity and necessity of the generalization processes and (2) show that the proposed between-dataset Calinski-Harabasz index accurately evaluates CLM across datasets. Finally, we demonstrate the importance of evaluating CLM of benchmark datasets before conducting external validation.

HCJan 30, 2022
ClassSPLOM -- A Scatterplot Matrix to Visualize Separation of Multiclass Multidimensional Data

Michael Aupetit, Ahmed Ali

In multiclass classification of multidimensional data, the user wants to build a model of the classes to predict the label of unseen data. The model is trained on the data and tested on unseen data with known labels to evaluate its quality. The results are visualized as a confusion matrix which shows how many data labels have been predicted correctly or confused with other classes. The multidimensional nature of the data prevents the direct visualization of the classes so we design ClassSPLOM to give more perceptual insights about the classification results. It uses the Scatterplot Matrix (SPLOM) metaphor to visualize a Linear Discriminant Analysis projection of the data for each pair of classes and a set of Receiving Operating Curves to evaluate their trustworthiness. We illustrate ClassSPLOM on a use case in Arabic dialects identification.

HCJan 29, 2021
Aquanims: Area-Preserving Animated Transitions in Statistical Data Graphics based on a Hydraulic Metaphor

Michael Aupetit

We propose "aquanims" as new design metaphors for animated transitions that preserve displayed areas during the transformation. Animated transitions are used to facilitate understanding of graphical transformations between different visualizations. Area is key information to preserve during filtering or ordering transitions of area-based charts like bar charts, histograms, treemaps, or mosaic plots. As liquids are incompressible fluids, we use a hydraulic metaphor to convey the sense of area preservation during animated transitions: in aquanims, graphical objects can change shape, position, color, and even connectedness but not displayed area, as for a liquid contained in a transparent vessel or transferred between such vessels communicating through hidden pipes. We present various aquanims for product plots like bar charts and histograms to accommodate changes in data, in the ordering of bars or in a number of bins, and to provide animated tips. We also consider confusion matrices visualized as fluctuation diagrams and mosaic plots, and show how aquanims can be used to ease the understanding of different classification errors of real data.

HCDec 8, 2020
An Enhanced MA Plot with R-Shiny to Ease Exploratory Analysis of Transcriptomic Data

Ali Sheharyar, Talar Boghos Yacoubian, Dina Aljogol et al.

MA plots are used to analyze the genome-wide differences in gene expression between two distinct biological conditions. An MA plot is usually rendered as a static scatter plot. Our interview with 3 experts in genomics showed that we could improve the usability of this plot by adding interactive analytic features. In this work we present the design study of the enhanced MA plot.

HCNov 15, 2020
Aquanims -- Area-Preserving Animated Transitions based on a Hydraulic Metaphor

Michael Aupetit

We propose "Aquanims" as new design metaphors for animated transitions that preserve displayed areas during the transformation. As liquids are incompressible fluids, we use a hydraulic metaphor to convey the sense of area preservation during animated transitions. We study the design space of Aquanims for rectangle-based charts.

HCMay 15, 2017
Visualizing Dimensionality Reduction Artifacts: An Evaluation

Nicolas Heulot, Jean-Daniel Fekete, Michael Aupetit

Multidimensional scaling allows visualizing high-dimensional data as 2D maps with the premise that insights in 2D reveal valid information in high-dimensions. However, the resulting projections suffer from artifacts such as bad local neighborhood preservation and clusters tearing. Interactively coloring the projection according to the discrepancy between original proximities relative to a reference item reveals these artifacts, but it is not clear if conveying these proximities using color and displaying only local information really helps the visual analysis of projections. We conducted a controlled experiment to investigate the relevance of this interactive technique to help the visual analysis of any projection regardless its quality. We compared the bare projection to the interactive coloring of the original proximities on different visual analysis tasks involving outliers and clusters. Results indicate that the interactive coloring is worthwhile for local tasks as it is significantly robust to projection artifacts whereas the projection is not. However this interactive technique does not help significantly for visual clustering tasks for that projections already give a suitable overview.

HCMay 10, 2017
Visualization of Wearable Data and Biometrics for Analysis and Recommendations in Childhood Obesity

Michael Aupetit, Luis Fernandez-Luque, Meghna Singh et al.

Obesity is one of the major health risk factors be- hind the rise of non-communicable conditions. Understanding the factors influencing obesity is very complex since there are many variables that can affect the health behaviors leading to it. Nowadays, multiple data sources can be used to study health behaviors, such as wearable sensors for physical activity and sleep, social media, mobile and health data. In this paper we describe the design of a dashboard for the visualization of actigraphy and biometric data from a childhood obesity camp in Qatar. This dashboard allows quantitative discoveries that can be used to guide patient behavior and orient qualitative research.

IRMar 9, 2012
A new supervised non-linear mapping

Sylvain Lespinats, Anke Meyer-Baese, Michael Aupetit

Supervised mapping methods project multi-dimensional labeled data onto a 2-dimensional space attempting to preserve both data similarities and topology of classes. Supervised mappings are expected to help the user to understand the underlying original class structure and to classify new data visually. Several methods have been designed to achieve supervised mapping, but many of them modify original distances prior to the mapping so that original data similarities are corrupted and even overlapping classes tend to be separated onto the map ignoring their original topology. We propose ClassiMap, an alternative method for supervised mapping. Mappings come with distortions which can be split between tears (close points mapped far apart) and false neighborhoods (points far apart mapped as neighbors). Some mapping methods favor the former while others favor the latter. ClassiMap switches between such mapping methods so that tears tend to appear between classes and false neighborhood within classes, better preserving classes' topology. We also propose two new objective criteria instead of the usual subjective visual inspection to perform fair comparisons of supervised mapping methods. ClassiMap appears to be the best supervised mapping method according to these criteria in our experiments on synthetic and real datasets.