Clustering data by reordering them
This is an incremental method for clustering in various scientific domains, such as biomolecules and images.
The paper tackles the problem of clustering data by reordering elements based on similarity and dissimilarity, resulting in an algorithm that automatically performs analysis with understandable parameters and handles noise explicitly.
Grouping elements into families to analyse them separately is a standard analysis procedure in many areas of sciences. We propose herein a new algorithm based on the simple idea that members from a family look like each other, and don't resemble elements foreign to the family. After reordering the data according to the distance between elements, the analysis is automatically performed with easily-understandable parameters. Noise is explicitly taken into account to deal with the variety of problems of a data-driven world. We applied the algorithm to sort biomolecules conformations, gene sequences, cells, images, and experimental conditions.