Marina Sapir

LG
5papers
3citations
Novelty31%
AI Score21

5 Papers

AIJun 15, 2022
Theory of Machine Learning with Limited Data

Marina Sapir

Application of machine learning may be understood as deriving new knowledge for practical use through explaining accumulated observations, training set. Peirce used the term abduction for this kind of inference. Here I formalize the concept of abduction for real valued hypotheses, and show that 14 of the most popular textbook ML learners (every learner I tested), covering classification, regression and clustering, implement this concept of abduction inference. The approach is proposed as an alternative to statistical learning theory, which requires an impractical assumption of indefinitely increasing training set for its justification.

LGJan 3, 2025
Practical machine learning is learning on small samples

Marina Sapir

Based on limited observations, machine learning discerns a dependence which is expected to hold in the future. What makes it possible? Statistical learning theory imagines indefinitely increasing training sample to justify its approach. In reality, there is no infinite time or even infinite general population for learning. Here I argue that practical machine learning is based on an implicit assumption that underlying dependence is relatively ``smooth" : likely, there are no abrupt differences in feedback between cases with close data points. From this point of view learning shall involve selection of the hypothesis ``smoothly" approximating the training set. I formalize this as Practical learning paradigm. The paradigm includes terminology and rules for description of learners. Popular learners (local smoothing, k-NN, decision trees, Naive Bayes, SVM for classification and for regression) are shown here to be implementations of this paradigm.

LGJun 16, 2020
Logic of Machine Learning

Marina Sapir

The main question is: why and how can we ever predict based on a finite sample? The question is not answered by statistical learning theory. Here, I suggest that prediction requires belief in "predictability" of the underlying dependence, and learning involves search for a hypothesis where these beliefs are violated the least given the observations. The measure of these violations ("errors") for given data, hypothesis and particular type of predictability beliefs is formalized as concept of incongruity in modal Logic of Observations and Hypotheses (LOH). I show on examples of many popular textbook learners (from hierarchical clustering to k-NN and SVM) that each of them minimizes its own version of incongruity. In addition, the concept of incongruity is shown to be flexible enough for formalization of some important data analysis problems, not considered as part of ML.

LGJul 27, 2018
Learnable: Theory vs Applications

Marina Sapir

Two different views on machine learning problem: Applied learning (machine learning with business applications) and Agnostic PAC learning are formalized and compared here. I show that, under some conditions, the theory of PAC Learnable provides a way to solve the Applied learning problem. However, the theory requires to have the training sets so large, that it would make the learning practically useless. I suggest shedding some theoretical misconceptions about learning to make the theory more aligned with the needs and experience of practitioners.

AIJun 26, 2017
Optimal choice: new machine learning problem and its solution

Marina Sapir

The task of learning to pick a single preferred example out a finite set of examples, an "optimal choice problem", is a supervised machine learning problem with complex, structured input. Problems of optimal choice emerge often in various practical applications. We formalize the problem, show that it does not satisfy the assumptions of statistical learning theory, yet it can be solved efficiently in some cases. We propose two approaches to solve the problem. Both of them reach good solutions on real life data from a signal processing application.