Jan Arne Telle

h-index28

6papers

27citations

Novelty49%

AI Score40

Ranked #71,170 of 194,257 authors (top 37%)#15,950 in LG (top 40%)

6 Papers

6.7LGMay 13Code

Teaching and Learning under Deductive Errors

Jan Arne Telle, Brigt Håvardstun, Jose Hernandez-Orallo

Most models of machine teaching and learning assume the learner makes no errors in its internal deductive inference. However, humans and large language models in few-shot learning regimes are two important examples of learners where this does not hold. They fail on some consistency checks, and they can fail stochastically. In this paper we introduce a teaching and learning framework that takes these deductive errors into account. We specifically study the case of machine teaching, as different characterizations of the teacher can account for both machine teaching and learning. In an overhauled Probably Approximately Correct (PAC) setting, we study theoretically that, for some estimated error level, the teacher must find a PAC teaching set that with high probability will lead the learner to guess a hypothesis that is approximately correct. We study the computational complexity of six different problems related to computing optimal PAC teaching sets. We give XP algorithms parametrized by size of teaching set, with tight runtime bounds under standard complexity assumptions like ETH. These results are complemented with a small experimental study of which teaching and learning protocols can best represent the observed behavior in some LLM teaching sessions.

3.8LGJul 11, 2023

MAP- and MLE-Based Teaching

Hans Ulrich Simon, Jan Arne Telle

Imagine a learner L who tries to infer a hidden concept from a collection of observations. Building on the work [4] of Ferri et al., we assume the learner to be parameterized by priors P(c) and by c-conditional likelihoods P(z|c) where c ranges over all concepts in a given class C and z ranges over all observations in an observation set Z. L is called a MAP-learner (resp. an MLE-learner) if it thinks of a collection S of observations as a random sample and returns the concept with the maximum a-posteriori probability (resp. the concept which maximizes the c-conditional likelihood of S). Depending on whether L assumes that S is obtained from ordered or unordered sampling resp. from sampling with or without replacement, we can distinguish four different sampling modes. Given a target concept c in C, a teacher for a MAP-learner L aims at finding a smallest collection of observations that causes L to return c. This approach leads in a natural manner to various notions of a MAP- or MLE-teaching dimension of a concept class C. Our main results are: We show that this teaching model has some desirable monotonicity properties. We clarify how the four sampling modes are related to each other. As for the (important!) special case, where concepts are subsets of a domain and observations are 0,1-labeled examples, we obtain some additional results. First of all, we characterize the MAP- and MLE-teaching dimension associated with an optimally parameterized MAP-learner graph-theoretically. From this central result, some other ones are easy to derive. It is shown, for instance, that the MLE-teaching dimension is either equal to the MAP-teaching dimension or exceeds the latter by 1. It is shown furthermore that these dimensions can be bounded from above by the so-called antichain number, the VC-dimension and related combinatorial parameters. Moreover they can be computed in polynomial time.

6.4LGJan 23, 2024

When Redundancy Matters: Machine Teaching of Representations

Cèsar Ferri, Dario Garigliotti, Brigt Arve Toppe Håvardstun et al.

In traditional machine teaching, a teacher wants to teach a concept to a learner, by means of a finite set of examples, the witness set. But concepts can have many equivalent representations. This redundancy strongly affects the search space, to the extent that teacher and learner may not be able to easily determine the equivalence class of each representation. In this common situation, instead of teaching concepts, we explore the idea of teaching representations. We work with several teaching schemas that exploit representation and witness size (Eager, Greedy and Optimal) and analyze the gains in teaching effectiveness for some representational languages (DNF expressions and Turing-complete P3 programs). Our theoretical and experimental results indicate that there are various types of redundancy, handled better by the Greedy schema introduced here than by the Eager schema, although both can be arbitrarily far away from the Optimal. For P3 programs we found that witness sets are usually smaller than the programs they identify, which is an illuminating justification of why machine teaching from examples makes sense at all.

4.1LGMay 13, 2025

Evaluating Simplification Algorithms for Interpretability of Time Series Classification

Brigt Håvardstun, Felix Marti-Perez, Cèsar Ferri et al.

In this work, we introduce metrics to evaluate the use of simplified time series in the context of interpretability of a TSC -- a Time Series Classifier. Such simplifications are important because time series data, in contrast to text and image data, are not intuitively under- standable to humans. These metrics are related to the complexity of the simplifications -- how many segments they contain -- and to their loyalty -- how likely they are to maintain the classification of the original time series. We focus on simplifications that select a subset of the original data points, and show that these typically have high Shapley value, thereby aiding interpretability. We employ these metrics to experimentally evaluate four distinct simplification algorithms, across several TSC algorithms and across datasets of varying characteristics, from seasonal or stationary to short or long. We subsequently perform a human-grounded evaluation with forward simulation, that confirms also the practical utility of the introduced metrics to evaluate the use of simplifications in the context of interpretability of TSC. Our findings are summarized in a framework for deciding, for a given TSC, if the various simplifications are likely to aid in its interpretability.

3.1AIApr 19, 2018

Finite Biased Teaching with Infinite Concept Classes

Jose Hernandez-Orallo, Jan Arne Telle

We investigate the teaching of infinite concept classes through the effect of the learning bias (which is used by the learner to prefer some concepts over others and by the teacher to devise the teaching examples) and the sampling bias (which determines how the concepts are sampled from the class). We analyse two important classes: Turing machines and finite-state machines. We derive bounds for the biased teaching dimension when the learning bias is derived from a complexity measure (Kolmogorov complexity and minimal number of states respectively) and analyse the sampling distributions that lead to finite expected biased teaching dimensions. We highlight the existing trade-off between the bound and the representativeness of the sample, and its implications for the understanding of what teaching rich concepts to machines entails.

4.3DSFeb 26, 2014

Solving MaxSAT and #SAT on structured CNF formulas

Sigve Hortemo Sæther, Jan Arne Telle, Martin Vatshelle

In this paper we propose a structural parameter of CNF formulas and use it to identify instances of weighted MaxSAT and #SAT that can be solved in polynomial time. Given a CNF formula we say that a set of clauses is precisely satisfiable if there is some complete assignment satisfying these clauses only. Let the ps-value of the formula be the number of precisely satisfiable sets of clauses. Applying the notion of branch decompositions to CNF formulas and using ps-value as cut function, we define the ps-width of a formula. For a formula given with a decomposition of polynomial ps-width we show dynamic programming algorithms solving weighted MaxSAT and #SAT in polynomial time. Combining with results of 'Belmonte and Vatshelle, Graph classes with structured neighborhoods and algorithmic applications, Theor. Comput. Sci. 511: 54-65 (2013)' we get polynomial-time algorithms solving weighted MaxSAT and #SAT for some classes of structured CNF formulas. For example, we get $O(m^2(m + n)s)$ algorithms for formulas $F$ of $m$ clauses and $n$ variables and size $s$, if $F$ has a linear ordering of the variables and clauses such that for any variable $x$ occurring in clause $C$, if $x$ appears before $C$ then any variable between them also occurs in $C$, and if $C$ appears before $x$ then $x$ occurs also in any clause between them. Note that the class of incidence graphs of such formulas do not have bounded clique-width.