Udi Boker

LG
3papers
1citation
Novelty60%
AI Score42

3 Papers

FLJul 11, 2024
History-Determinism vs Fair Simulation

Udi Boker, Thomas A. Henzinger, Karoliina Lehtinen et al.

An automaton is history-deterministic if its nondeterminism can be resolved on the fly, only using the prefix of the word read so far. This mild form of nondeterminism has attracted particular attention for its applications in synthesis problems. An automaton $A$ is guidable with respect to a class $C$ of automata if it can fairly simulate every automaton in $C$ whose language is contained in that of $A$. In other words, guidable automata are those for which inclusion and simulation coincide, making them particularly interesting for model-checking. We study the connection between these two notions, and specifically the question of when they coincide. For classes of automata on which they do, deciding guidability, an otherwise challenging decision problem, reduces to deciding history-determinism, a problem that is starting to be well-understood for many classes. We provide a selection of sufficient criteria for a class of automata to guarantee the coincidence of the notions, and use them to show that the notions coincide for the most common automata classes, among which are $ω$-regular automata and many infinite-state automata with safety and reachability acceptance conditions, including vector addition systems with states, one-counter nets, pushdown-, Parikh-, and timed-automata. We also demonstrate that history-determinism and guidability do not always coincide, for example, for the classes of timed automata with a fixed number of clocks.

31.1LGApr 12
WOODELF-HD: Efficient Background SHAP for High-Depth Decision Trees

Ron Wettenstein, Alexander Nadel, Udi Boker

Decision-tree ensembles are a cornerstone of predictive modeling, and SHAP is a standard framework for interpreting their predictions. Among its variants, Background SHAP offers high accuracy by modeling missing features using a background dataset. Historically, this approach did not scale well, as the time complexity for explaining n instances using m background samples included an O(mn) component. Recent methods such as Woodelf and PLTreeSHAP reduce this to O(m+n), but introduce a preprocessing bottleneck that grows as 3^D with tree depth D, making them impractical for deep trees. We address this limitation with WoodelfHD, a Woodelf extension that reduces the 3^D factor to 2^D. The key idea is a Strassen-like multiplication scheme that exploits the structure of Woodelf matrices, reducing matrix-vector multiplication from O(k^2) to O(k*log(k)) via a fully vectorized, non-recursive implementation. In addition, we merge path nodes with identical features, reducing cache size and memory usage. When running on standard environments, WoodelfHD enables exact Background SHAP computation for trees with depths up to 21, where previous methods fail due to excessive memory usage. For ensembles of depths 12 and 15, it achieves speedups of 33x and 162x, respectively, over the state-of-the-art.

20.8LGMay 14
Woodelf++: A Fast and Unified Partial Dependence Plot Algorithm for Decision Tree Ensembles

Ron Wettenstein, Alexander Nadel, Udi Boker

Partial Dependence Plots (PDPs) visualize how changes in a single feature affect the average model prediction. They are widely used in practice to interpret decision tree ensembles and other machine learning models. Joint-PDPs extend this idea to pairs of features, revealing their combined effect. Partial Dependence Interaction Values (PDIVs) measure feature interactions. The Any-Order-PDIVs task computes these interactions for every feature subset across all rows of the dataset. We introduce Woodelf++, a unified and efficient approach for computing all these useful explainability tools on decision tree ensembles, building on Woodelf, an algorithm for efficient SHAP computation. By deriving suitable metrics over pseudo-Boolean functions, Woodelf++ can compute PDPs (exact and approximate), Joint-PDPs, and Any-Order-PDIVs in a unified framework. Our method delivers substantial complexity improvements over the state of the art, including an exponential gain for Any-Order-PDIVs. Additionally, we introduce and efficiently compute Full PDPs, which leverage the model's split thresholds to faithfully capture its behavior across all possible feature values. Woodelf++ is implemented in pure Python and supports GPU acceleration. On a dataset with 400,000 rows, Woodelf++ computes PDP and Joint-PDP up to 6x faster than the state of the art and up to five orders of magnitude faster than scikit-learn. For Any-Order-PDIVs, the gap is even larger: Woodelf++ computes all interaction values in 5 minutes, while the state of the art is estimated to require over 1,000,000 years.