Michael Kaufmann

5papers

12citations

Novelty44%

AI Score37

Ranked #113,131 of 205,806 authors (top 55%)#24,767 in LG (top 58%)

5 Papers

48.0DMMay 15

A first view on the density of 5-planar graphs

Aaron Büngener, Jakob Franz, Michael Kaufmann et al.

A key concept for many graph layout algorithms is planarity, a graph property that allows to draw vertices and edges crossing-free in the plane. Important is the generalization to $k$-planar graphs, which can be drawn in the plane with at most $k > 0$ crossings per edge. One of the basic graph properties that have been explored for those graph classes is the maximum edge density, i.e., the maximum number of edges a $k$-planar graph on $n$ vertices may have. While there are numerous results for the classes of $1$- and $2$-planar graphs, there are few results for increasing $k=3$ or $4$ due to the complex graph structures. We make a first step towards even larger $k>4$ exploring the class of $5$-planar graphs. While our main tool is still a discharging technique, a better understanding of the structure of the denser parts leads to corresponding density bounds in a much simpler way. We first apply a simplified version of our technique to outer $5$-planar graphs and surprisingly observe that the structure of maximally dense (general) $5$-planar graphs differs from the known uniform structure of maximally dense $k$-planar graphs for smaller $1 \leq k \leq 4$. As the central result of this paper, we then show that graphs that admit a simple 5-planar drawing have at most $7(n-2)$ edges, drastically improving the previous best bound of $\approx8.3n$. This even implies a small improvement of the leading constant in the Crossing Lemma $cr(G) \ge c \frac{m^3}{n^2}$ from $c=\frac{1}{27.48}$ to $c=\frac{1}{27.3}$. To demonstrate the potential of our new technique, we also apply it to 4-planar and 6-planar graphs.

IRJan 14, 2022

The Lokahi Prototype: Toward the automatic Extraction of Entity Relationship Models from Text

Michael Kaufmann

Entity relationship extraction envisions the automatic generation of semantic data models from collections of text, by automatic recognition of entities, by association of entities to form relationships, and by classifying these instances to assign them to entity sets (or classes) and relationship sets (or associations). As a first step in this direction, the Lokahi prototype can extract entities based on the TF*IDF measure, and generate semantic relationships based on document-level co-occurrence statistics, for example with likelihood ratios and pointwise mutual information. This paper presents results of an explorative, prototypical, qualitative and synthetic research, summarizes insights from two research projects and, based on this, indicates an outline for further research in the field of entity relationship extraction from text.

DBApr 7, 2021

Efficient and Accurate In-Database Machine Learning with SQL Code Generation in Python

Michael Kaufmann, Gabriel Stechschulte, Anna Huber

Following an analysis of the advantages of SQL-based Machine Learning (ML) and a short literature survey of the field, we describe a novel method for In-Database Machine Learning (IDBML). We contribute a process for SQL-code generation in Python using template macros in Jinja2 as well as the prototype implementation of the process. We describe our implementation of the process to compute multidimensional histogram (MDH) probability estimation in SQL. For this, we contribute and implement a novel discretization method called equal quantized rank binning (EQRB) and equal-width binning (EWB). Based on this, we provide data gathered in a benchmarking experiment for the quantitative empirical evaluation of our method and system using the Covertype dataset. We measured accuracy and computation time and compared it to Scikit Learn state of the art classification algorithms. Using EWB, our multidimensional probability estimation was the fastest of all tested algorithms, while being only 1-2% less accurate than the best state of the art methods found (decision trees and random forests). Our method was significantly more accurate than Naive Bayes, which assumes independent one-dimensional probabilities and/or densities. Also, our method was significantly more accurate and faster than logistic regression. This motivates for further research in accuracy improvement and in IDBML with SQL code generation for big data and larger-than-memory datasets.

LGSep 11, 2019

Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle

Michael Kaufmann, Kornilios Kourtis, Celestine Mendler-Dünner et al.

Distributed machine learning training is one of the most common and important workloads running on data centers today, but it is rarely executed alone. Instead, to reduce costs, computing resources are consolidated and shared by different applications. In this scenario, elasticity and proper load balancing are vital to maximize efficiency, fairness, and utilization. Currently, most distributed training frameworks do not support the aforementioned properties. A few exceptions that do support elasticity, imitate generic distributed frameworks and use micro-tasks. In this paper we illustrate that micro-tasks are problematic for machine learning applications, because they require a high degree of parallelism which hinders the convergence of distributed training at a pure algorithmic level (i.e., ignoring overheads and scalability limitations). To address this, we propose Chicle, a new elastic distributed training framework which exploits the nature of machine learning algorithms to implement elasticity and load balancing without micro-tasks. We use Chicle to train deep neural network as well as generalized linear models, and show that Chicle achieves performance competitive with state of the art rigid frameworks, while efficiently enabling elastic execution and dynamic load balancing.

LGNov 6, 2018

Elastic CoCoA: Scaling In to Improve Convergence

Michael Kaufmann, Thomas Parnell, Kornilios Kourtis

In this paper we experimentally analyze the convergence behavior of CoCoA and show, that the number of workers required to achieve the highest convergence rate at any point in time, changes over the course of the training. Based on this observation, we build Chicle, an elastic framework that dynamically adjusts the number of workers based on feedback from the training algorithm, in order to select the number of workers that results in the highest convergence rate. In our evaluation of 6 datasets, we show that Chicle is able to accelerate the time-to-accuracy by a factor of up to 5.96x compared to the best static setting, while being robust enough to find an optimal or near-optimal setting automatically in most cases.