72.2PLMar 17
Types, equations, dimensions and the Pi theoremNicola Botta, Patrik Jansson
The languages of mathematical physics and modelling are endowed with a rich ``grammar of dimensions'' that common abstractions of programming languages fail to represent. We propose a dependently typed domain-specific language (embedded in Idris) that captures this grammar. We apply it to formalize basic notions of dimensional analysis: those of dimension function, physical quantity, homomorphic measurement, the covariance principle and Buckingham's Pi theorem. We hope that the language makes mathematical physics more accessible to computer scientists and functional programming more palatable to modellers and physicists.
58.4OCMar 14
Optimization under uncertainty: understanding orders and testing programs with specificationsPatrik Jansson, Nicola Botta, Tim Richter
One of the most ubiquitous problems in optimization is that of finding all the elements of a finite set at which a function $f$ attains its minimum (or maximum). When the codomain of $f$ is equipped with a total order, it is easy to specify, implement, and verify generic solutions to this problem. But what if $f$ is affected by uncertainties? What if one seeks values that minimize more than one objective, or if $f$ does not return a single result but a set of possible results, or even a probability distribution? Such situations are common in climate science, economics, and engineering. Developing trustworthy solution methods for optimization under uncertainty requires formulating and answering these questions rigorously, including deciding which order relations to apply in different cases. We show how functional programming can support this task, and apply it to specify and test solution methods for cases where optimization is affected by two conceptually different kinds of uncertainty: value and functorial uncertainty. We analyze the interplay of orders in these contexts, demonstrate how standard minimization generalizes to partial orders in the multi-objective setting and how it can be lifted via monotonicity conditions to handle functorial uncertainty.
LGJan 2, 2022
ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution FunctionsZheng Li, Yue Zhao, Xiyang Hu et al.
Outlier detection refers to the identification of data points that deviate from a general data distribution. Existing unsupervised approaches often suffer from high computational cost, complex hyperparameter tuning, and limited interpretability, especially when working with large, high-dimensional datasets. To address these issues, we present a simple yet effective algorithm called ECOD (Empirical-Cumulative-distribution-based Outlier Detection), which is inspired by the fact that outliers are often the "rare events" that appear in the tails of a distribution. In a nutshell, ECOD first estimates the underlying distribution of the input data in a nonparametric fashion by computing the empirical cumulative distribution per dimension of the data. ECOD then uses these empirical distributions to estimate tail probabilities per dimension for each data point. Finally, ECOD computes an outlier score of each data point by aggregating estimated tail probabilities across dimensions. Our contributions are as follows: (1) we propose a novel outlier detection method called ECOD, which is both parameter-free and easy to interpret; (2) we perform extensive experiments on 30 benchmark datasets, where we find that ECOD outperforms 11 state-of-the-art baselines in terms of accuracy, efficiency, and scalability; and (3) we release an easy-to-use and scalable (with distributed support) Python implementation for accessibility and reproducibility.
MLSep 20, 2020
COPOD: Copula-Based Outlier DetectionZheng Li, Yue Zhao, Nicola Botta et al.
Outlier detection refers to the identification of rare items that are deviant from the general data distribution. Existing approaches suffer from high computational complexity, low predictive capability, and limited interpretability. As a remedy, we present a novel outlier detection algorithm called COPOD, which is inspired by copulas for modeling multivariate data distribution. COPOD first constructs an empirical copula, and then uses it to predict tail probabilities of each given data point to determine its level of "extremeness". Intuitively, we think of this as calculating an anomalous p-value. This makes COPOD both parameter-free, highly interpretable, and computationally efficient. In this work, we make three key contributions, 1) propose a novel, parameter-free outlier detection algorithm with both great performance and interpretability, 2) perform extensive experiments on 30 benchmark datasets to show that COPOD outperforms in most cases and is also one of the fastest algorithms, and 3) release an easy-to-use Python implementation for reproducibility.