Bernhard Moser

h-index15

9papers

157citations

Novelty52%

AI Score41

Ranked #64,056 of 194,257 authors (top 33%)#14,432 in LG (top 36%)

9 Papers

6.6LGApr 3, 2023

On Mitigating the Utility-Loss in Differentially Private Learning: A new Perspective by a Geometrically Inspired Kernel Approach

Mohit Kumar, Bernhard A. Moser, Lukas Fischer

Privacy-utility tradeoff remains as one of the fundamental issues of differentially private machine learning. This paper introduces a geometrically inspired kernel-based approach to mitigate the accuracy-loss issue in classification. In this approach, a representation of the affine hull of given data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads to a novel distance measure that hides privacy-sensitive information about individual data points and improves the privacy-utility tradeoff via significantly reducing the risk of membership inference attacks. The effectiveness of the approach is demonstrated through experiments on MNIST dataset, Freiburg groceries dataset, and a real biomedical dataset. It is verified that the approach remains computationally practical. The application of the approach to federated learning is considered and it is observed that the accuracy-loss due to data being distributed is either marginal or not significantly high.

15.8MLNov 16, 2017Code

Robust Unsupervised Domain Adaptation for Neural Networks via Moment Alignment

Werner Zellinger, Bernhard A. Moser, Thomas Grubinger et al.

A novel approach for unsupervised domain adaptation for neural networks is proposed. It relies on metric-based regularization of the learning process. The metric-based regularization aims at domain-invariant latent feature representations by means of maximizing the similarity between domain-specific activation distributions. The proposed metric results from modifying an integral probability metric such that it becomes less translation-sensitive on a polynomial function space. The metric has an intuitive interpretation in the dual space as the sum of differences of higher order central moments of the corresponding activation distributions. Under appropriate assumptions on the input distributions, error minimization is proven for the continuous case. As demonstrated by an analysis of standard benchmark experiments for sentiment analysis, object recognition and digit recognition, the outlined approach is robust regarding parameter changes and achieves higher classification accuracies than comparable approaches. The source code is available at https://github.com/wzell/mann.

6.4LGJul 5, 2024

Geometrically Inspired Kernel Machines for Collaborative Learning Beyond Gradient Descent

Mohit Kumar, Alexander Valentinitsch, Magdalena Fuchs et al.

This paper develops a novel mathematical framework for collaborative learning by means of geometrically inspired kernel machines which includes statements on the bounds of generalisation and approximation errors, and sample complexity. For classification problems, this approach allows us to learn bounded geometric structures around given data points and hence solve the global model learning problem in an efficient way by exploiting convexity properties of the related optimisation problem in a Reproducing Kernel Hilbert Space (RKHS). In this way, we can reduce classification problems to determining the closest bounded geometric structure from a given data point. Further advantages that come with our solution is that our approach does not require clients to perform multiple epochs of local optimisation using stochastic gradient descent, nor require rounds of communication between client/server for optimising the global model. We highlight that numerous experiments have shown that the proposed method is a competitive alternative to the state-of-the-art.

7.1LGNov 30, 2025

Operator-Theoretic Framework for Gradient-Free Federated Learning

Mohit Kumar, Mathias Brucker, Alexander Valentinitsch et al.

Federated learning must address heterogeneity, strict communication and computation limits, and privacy while ensuring performance. We propose an operator-theoretic framework that maps the $L^2$-optimal solution into a reproducing kernel Hilbert space (RKHS) via a forward operator, approximates it using available data, and maps back with the inverse operator, yielding a gradient-free scheme. Finite-sample bounds are derived using concentration inequalities over operator norms, and the framework identifies a data-dependent hypothesis space with guarantees on risk, error, robustness, and approximation. Within this space we design efficient kernel machines leveraging the space folding property of Kernel Affine Hull Machines. Clients transfer knowledge via a scalar space folding measure, reducing communication and enabling a simple differentially private protocol: summaries are computed from noise-perturbed data matrices in one step, avoiding per-round clipping and privacy accounting. The induced global rule requires only integer minimum and equality-comparison operations per test point, making it compatible with fully homomorphic encryption (FHE). Across four benchmarks, the gradient-free FL method with fixed encoder embeddings matches or outperforms strong gradient-based fine-tuning, with gains up to 23.7 points. In differentially private experiments, kernel smoothing mitigates accuracy loss in high-privacy regimes. The global rule admits an FHE realization using $Q \times C$ encrypted minimum and $C$ equality-comparison operations per test point, with operation-level benchmarks showing practical latencies. Overall, the framework provides provable guarantees with low communication, supports private knowledge transfer via scalar summaries, and yields an FHE-compatible prediction rule offering a mathematically grounded alternative to gradient-based federated learning under heterogeneity.

6.6SCMay 26, 2023

Representing Piecewise Linear Functions by Functions with Small Arity

Christoph Koutschan, Bernhard Moser, Anton Ponomarchuk et al.

A piecewise linear function can be described in different forms: as an arbitrarily nested expression of $\min$- and $\max$-functions, as a difference of two convex piecewise linear functions, or as a linear combination of maxima of affine-linear functions. In this paper, we provide two main results: first, we show that for every piecewise linear function there exists a linear combination of $\max$-functions with at most $n+1$ arguments, and give an algorithm for its computation. Moreover, these arguments are contained in the finite set of affine-linear functions that coincide with the given function in some open set. Second, we prove that the piecewise linear function $\max(0, x_{1}, \ldots, x_{n})$ cannot be represented as a linear combination of maxima of less than $n+1$ affine-linear arguments. This was conjectured by Wang and Sun in 2005 in a paper on representations of piecewise linear functions as linear combination of maxima.

18.4MLMay 2, 2023Code

Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation

Marius-Constantin Dinu, Markus Holzleitner, Maximilian Beck et al.

We study the problem of choosing algorithm hyper-parameters in unsupervised domain adaptation, i.e., with labeled data in a source domain and unlabeled data in a target domain, drawn from a different input distribution. We follow the strategy to compute several models using different hyper-parameters, and, to subsequently compute a linear aggregation of the models. While several heuristics exist that follow this strategy, methods are still missing that rely on thorough theories for bounding the target error. In this turn, we propose a method that extends weighted least squares to vector-valued functions, e.g., deep neural networks. We show that the target error of the proposed algorithm is asymptotically not worse than twice the error of the unknown optimal aggregation. We also perform a large scale empirical comparative study on several datasets, including text, images, electroencephalogram, body sensor signals and signals from mobile phones. Our method outperforms deep embedded validation (DEV) and importance weighted validation (IWV) on all datasets, setting a new state-of-the-art performance for solving parameter choice issues in unsupervised domain adaptation with theoretical error guarantees. We further study several competitive heuristics, all outperforming IWV and DEV on at least five datasets. However, our method outperforms each heuristic on at least five of seven datasets.

3.1LGJun 6, 2021

Information Theoretic Evaluation of Privacy-Leakage, Interpretability, and Transferability for Trustworthy AI

Mohit Kumar, Bernhard A. Moser, Lukas Fischer et al.

In order to develop machine learning and deep learning models that take into account the guidelines and principles of trustworthy AI, a novel information theoretic trustworthy AI framework is introduced. A unified approach to "privacy-preserving interpretable and transferable learning" is considered for studying and optimizing the tradeoffs between privacy, interpretability, and transferability aspects. A variational membership-mapping Bayesian model is used for the analytical approximations of the defined information theoretic measures for privacy-leakage, interpretability, and transferability. The approach consists of approximating the information theoretic measures via maximizing a lower-bound using variational optimization. The study presents a unified information theoretic approach to study different aspects of trustworthy AI in a rigorous analytical manner. The approach is demonstrated through numerous experiments on benchmark datasets and a real-world biomedical application concerned with the detection of mental stress on individuals using heart rate variability analysis.

7.5LGApr 14, 2021

Membership-Mappings for Data Representation Learning: Measure Theoretic Conceptualization

Mohit Kumar, Bernhard A. Moser, Lukas Fischer et al.

A fuzzy theoretic analytical approach was recently introduced that leads to efficient and robust models while addressing automatically the typical issues associated to parametric deep models. However, a formal conceptualization of the fuzzy theoretic analytical deep models is still not available. This paper introduces using measure theoretic basis the notion of \emph{membership-mapping} for representing data points through attribute values (motivated by fuzzy theory). A property of the membership-mapping, that can be exploited for data representation learning, is of providing an interpolation on the given data points in the data space. An analytical approach to the variational learning of a membership-mappings based data representation model is considered.

3.8MLMay 20, 2020Code

ReLU Code Space: A Basis for Rating Network Quality Besides Accuracy

Natalia Shepeleva, Werner Zellinger, Michal Lewandowski et al.

We propose a new metric space of ReLU activation codes equipped with a truncated Hamming distance which establishes an isometry between its elements and polyhedral bodies in the input space which have recently been shown to be strongly related to safety, robustness, and confidence. This isometry allows the efficient computation of adjacency relations between the polyhedral bodies. Experiments on MNIST and CIFAR-10 indicate that information besides accuracy might be stored in the code space.