Peter Ruckdeschel

2papers

2 Papers

LGOct 29, 2025
flowengineR: A Modular and Extensible Framework for Fair and Reproducible Workflow Design in R

Maximilian Willer, Peter Ruckdeschel

flowengineR is an R package designed to provide a modular and extensible framework for building reproducible algorithmic workflows for general-purpose machine learning pipelines. It is motivated by the rapidly evolving field of algorithmic fairness where new metrics, mitigation strategies, and machine learning methods continuously emerge. A central challenge in fairness, but also far beyond, is that existing toolkits either focus narrowly on single interventions or treat reproducibility and extensibility as secondary considerations rather than core design principles. flowengineR addresses this by introducing a unified architecture of standardized engines for data splitting, execution, preprocessing, training, inprocessing, postprocessing, evaluation, and reporting. Each engine encapsulates one methodological task yet communicates via a lightweight interface, ensuring workflows remain transparent, auditable, and easily extensible. Although implemented in R, flowengineR builds on ideas from workflow languages (CWL, YAWL), graph-oriented visual programming languages (KNIME), and R frameworks (BatchJobs, batchtools). Its emphasis, however, is less on orchestrating engines for resilient parallel execution but rather on the straightforward setup and management of distinct engines as data structures. This orthogonalization enables distributed responsibilities, independent development, and streamlined integration. In fairness context, by structuring fairness methods as interchangeable engines, flowengineR lets researchers integrate, compare, and evaluate interventions across the modeling pipeline. At the same time, the architecture generalizes to explainability, robustness, and compliance metrics without core modifications. While motivated by fairness, it ultimately provides a general infrastructure for any workflow context where reproducibility, transparency, and extensibility are essential.

STSep 24, 2019
The column measure and Gradient-Free Gradient Boosting

Tino Werner, Peter Ruckdeschel

Sparse model selection by structural risk minimization leads to a set of a few predictors, ideally a subset of the true predictors. This selection clearly depends on the underlying loss function $\tilde L$. For linear regression with square loss, the particular (functional) Gradient Boosting variant $L_2-$Boosting excels for its computational efficiency even for very large predictor sets, while still providing suitable estimation consistency. For more general loss functions, functional gradients are not always easily accessible or, like in the case of continuous ranking, need not even exist. To close this gap, starting from column selection frequencies obtained from $L_2-$Boosting, we introduce a loss-dependent ''column measure'' $ν^{(\tilde L)}$ which mathematically describes variable selection. The fact that certain variables relevant for a particular loss $\tilde L$ never get selected by $L_2-$Boosting is reflected by a respective singular part of $ν^{(\tilde L)}$ w.r.t. $ν^{(L_2)}$. With this concept at hand, it amounts to a suitable change of measure (accounting for singular parts) to make $L_2-$Boosting select variables according to a different loss $\tilde L$. As a consequence, this opens the bridge to applications of simulational techniques such as various resampling techniques, or rejection sampling, to achieve this change of measure in an algorithmic way.