Miloš Simić

ML
3papers
9citations
Novelty50%
AI Score21

3 Papers

MLOct 21, 2020
How to Control the Error Rates of Binary Classifiers

Miloš Simić

The traditional binary classification framework constructs classifiers which may have good accuracy, but whose false positive and false negative error rates are not under users' control. In many cases, one of the errors is more severe and only the classifiers with the corresponding rate lower than the predefined threshold are acceptable. In this study, we combine binary classification with statistical hypothesis testing to control the target error rate of already trained classifiers. In particular, we show how to turn binary classifiers into statistical tests, calculate the classification p-values, and use them to limit the target error rate.

MLSep 29, 2020
Testing for Normality with Neural Networks

Miloš Simić

In this paper, we treat the problem of testing for normality as a binary classification problem and construct a feedforward neural network that can successfully detect normal distributions by inspecting small samples from them. The numerical experiments conducted on small samples with no more than 100 elements indicated that the neural network which we trained was more accurate and far more powerful than the most frequently used and most powerful standard tests of normality: Shapiro-Wilk, Anderson-Darling, Lilliefors and Jarque-Berra, as well as the kernel tests of goodness-of-fit. The neural network had the AUROC score of almost 1, which corresponds to the perfect binary classifier. Additionally, the network's accuracy was higher than 96% on a set of larger samples with 250-1000 elements. Since the normality of data is an assumption of numerous techniques for analysis and inference, the neural network constructed in this study has a very high potential for use in everyday practice of statistics, data analysis and machine learning in both science and industry.

AIMar 29, 2019
How to Estimate the Ability of a Metaheuristic Algorithm to Guide Heuristics During Optimization

Miloš Simić

Metaheuristics are general methods that guide application of concrete heuristic(s) to problems that are too hard to solve using exact algorithms. However, even though a growing body of literature has been devoted to their statistical evaluation, the approaches proposed so far are able to assess only coupled effects of metaheuristics and heuristics. They do not reveal us anything about how efficient the examined metaheuristic is at guiding its subordinate heuristic(s), nor do they provide us information about how much the heuristic component of the combined algorithm contributes to the overall performance. In this paper, we propose a simple yet effective methodology of doing so by deriving a naive, placebo metaheuristic from the one being studied and comparing the distributions of chosen performance metrics for the two methods. We propose three measures of difference between the two distributions. Those measures, which we call BER values (benefit, equivalence, risk) are based on a preselected threshold of practical significance which represents the minimal difference between two performance scores required for them to be considered practically different. We illustrate usefulness of our methodology on the example of Simulated Annealing, Boolean Satisfiability Problem, and the Flip heuristic.