LGApr 17, 2025

Sliced-Wasserstein Distance-based Data Selection

arXiv:2504.12918v17.11 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This work addresses data selection for anomaly detection in critical decision-making pipelines, but it appears incremental as it builds on existing sliced-Wasserstein distance techniques with new approximations and applications.

The paper tackles the problem of training data selection for anomaly detection by proposing an unsupervised method based on sliced-Wasserstein distance, which offers conservative selection and optimal transport interpretation for critical sectors like power systems. It introduces two efficient approximations for scalability and benchmarks the method on synthetic and real-world datasets, including a new open-source dataset for localized demand response.

We propose a new unsupervised anomaly detection method based on the sliced-Wasserstein distance for training data selection in machine learning approaches. Our filtering technique is interesting for decision-making pipelines deploying machine learning models in critical sectors, e.g., power systems, as it offers a conservative data selection and an optimal transport interpretation. To ensure the scalability of our method, we provide two efficient approximations. The first approximation processes reduced-cardinality representations of the datasets concurrently. The second makes use of a computationally light Euclidian distance approximation. Additionally, we open the first dataset showcasing localized critical peak rebate demand response in a northern climate. We present the filtering patterns of our method on synthetic datasets and numerically benchmark our method for training data selection. Finally, we employ our method as part of a first forecasting benchmark for our open-source dataset.

View on arXiv PDF

Similar