71.8DCJun 2
SIGMA: A Versatile Streaming Graph Partitioner for Vertex- and Edge-Balanced Distributed GNN TrainingBarbara Hoffmann, Shai Dorian Peretz, Adil Chhabra et al.
Distributed Graph Neural Network (GNN) training depends critically on how the underlying graph is partitioned across compute resources. Existing graph partitioners focus either on vertex partitioning or edge partitioning and typically optimize only a single communication objective (edge cut or vertex cut) under a single balance constraint (vertex balance or edge balance). We present SIGMA (Streaming Integrated Graph Partitioning with Multi-objective Awareness), a versatile streaming graph partitioner that supports both vertex and edge partitioning within a unified multi-objective, multi-constraint framework. Depending on the target distributed GNN system, SIGMA can be configured for edgecut-oriented vertex partitioning or vertex-cut-oriented edge partitioning while simultaneously accounting for both vertex and edge balancing. A clustering-based preprocessing stage incorporates global graph structure to improve partition quality while preserving the efficiency and scalability advantages of streaming partitioning. We evaluate SIGMA on six benchmark graphs spanning diverse domains and scales using two distributed GNN training systems: Dist-GNN (edge-partitioned) and DistDGL (vertex-partitioned). Across both settings, SIGMA consistently achieves strong performance, showing its ability to navigate complex trade-offs between partition quality, training efficiency, and memory consumption, frequently outperforming streaming baselines while remaining competitive with high-quality in-memory partitioners such as METIS, KaHIP and HEP. These results demonstrate that a unified streaming partitioner can effectively address the communication, compute, and memory challenges of distributed GNN training across fundamentally different system architectures.
DSMay 26, 2022
More Recent Advances in (Hyper)Graph PartitioningÜmit V. Çatalyürek, Karen D. Devine, Marcelo Fonseca Faraj et al.
In recent years, significant advances have been made in the design and evaluation of balanced (hyper)graph partitioning algorithms. We survey trends of the last decade in practical algorithms for balanced (hyper)graph partitioning together with future research directions. Our work serves as an update to a previous survey on the topic. In particular, the survey extends the previous survey by also covering hypergraph partitioning and streaming algorithms, and has an additional focus on parallel algorithms.
SIMay 11, 2022
Local Motif Clustering via (Hyper)Graph PartitioningAdil Chhabra, Marcelo Fonseca Faraj, Christian Schulz
A widely-used operation on graphs is local clustering, i.e., extracting a well-characterized community around a seed node without the need to process the whole graph. Recently local motif clustering has been proposed: it looks for a local cluster based on the distribution of motifs. Since this local clustering perspective is relatively new, most approaches proposed for it are extensions of statistical and numerical methods previously used for edge-based local clustering, while the available combinatorial approaches are still few and relatively simple. In this work, we build a hypergraph and a graph model which both represent the motif-distribution around the seed node. We solve these models using sophisticated combinatorial algorithms designed for (hyper)graph partitioning. In extensive experiments with the triangle motif, we observe that our algorithm computes communities with a motif conductance value being one third on average in comparison against the communities computed by the state-of-the-art tool MAPPR while being 6.3 times faster on average.
DSFeb 1, 2023
Improved Exact and Heuristic Algorithms for Maximum Weight CliqueRoman Erhardt, Kathrin Hanauer, Nils Kriege et al.
We propose improved exact and heuristic algorithms for solving the maximum weight clique problem, a well-known problem in graph theory with many applications. Our algorithms interleave successful techniques from related work with novel data reduction rules that use local graph structure to identify and remove vertices and edges while retaining the optimal solution. We evaluate our algorithms on a range of synthetic and real-world graphs, and find that they outperform the current state of the art on most inputs. Our data reductions always produce smaller reduced graphs than existing data reductions alone. As a result, our exact algorithm, MWCRedu, finds solutions orders of magnitude faster on naturally weighted, medium-sized map labeling graphs and random hyperbolic graphs. Our heuristic algorithm, MWCPeel, outperforms its competitors on these instances, but is slightly less effective on extremely dense or large instances.
CVDec 15, 2022
Attention-based Multiple Instance Learning for Survival Prediction on Lung Cancer Tissue MicroarraysJonas Ammeling, Lars-Henning Schmidt, Jonathan Ganz et al.
Attention-based multiple instance learning (AMIL) algorithms have proven to be successful in utilizing gigapixel whole-slide images (WSIs) for a variety of different computational pathology tasks such as outcome prediction and cancer subtyping problems. We extended an AMIL approach to the task of survival prediction by utilizing the classical Cox partial likelihood as a loss function, converting the AMIL model into a nonlinear proportional hazards model. We applied the model to tissue microarray (TMA) slides of 330 lung cancer patients. The results show that AMIL approaches can handle very small amounts of tissue from a TMA and reach similar C-index performance compared to established survival prediction methods trained with highly discriminative clinical factors such as age, cancer grade, and cancer stage
10.2DSMar 12
Efficient Parallel Algorithms for Hypergraph MatchingHenrik Reinstädtler, Christian Schulz, Nodari Sitchinava et al.
We present efficient parallel algorithms for computing maximal matchings in hypergraphs. Our algorithm finds locally maximal edges in the hypergraph and adds them in parallel to the matching. In the CRCW PRAM models our algorithms achieve $O(\log{m})$ time with $O(κ\log {m})$ work w.h.p. where $m$ is the number of hyperedges, and $κ$ is the sum of all vertex degrees. The CREW PRAM model algorithm has a running time of $O((\logÎ+\log{d})\log{m})$ and requires $O(κ\log {m})$ work w.h.p. It can be implemented work-optimal with $O(κ)$ work in $O((\log{m}+\log{n})\log{m})$ time. We prove a $1/d$-approximation guarantee for our algorithms. We evaluate our algorithms experimentally by implementing and running the proposed algorithms on the GPU using CUDA and Kokkos. Our experimental evaluation demonstrates the practical efficiency of our approach on real-world hypergraph instances, yielding a speed up of up to 76 times compared to a single-core CPU algorithm.
20.7DSMar 18
A Comprehensive Survey of Data Reduction Rules for the Maximum Weighted Independent Set ProblemErnestine Großmann, Kenneth Langedal, Christian Schulz
The Maximum Weight Independent Set (MWIS) problem, as well as its related problems such as Minimum Weight Vertex Cover, are fundamental NP-hard problems with numerous practical applications. Due to their computational complexity, a variety of data reduction rules have been proposed in recent years to simplify instances of these problems, enabling exact solvers and heuristics to handle them more effectively. Data reduction rules are polynomial time procedures that can reduce an instance while ensuring that an optimal solution on the reduced instance can be easily extended to an optimal solution for the original instance. Data reduction rules have proven to be especially useful in branch-and-reduce methods, where successful reductions often lead to problem instances that can be solved exactly. This survey provides a comprehensive overview of data reduction rules for the MWIS problem. We also provide a reference implementation for these reductions. This survey will be updated as new reduction techniques are developed, serving as a centralized resource for researchers and practitioners.
52.1DSMar 26
Advances in Exact and Approximate Group Closeness Centrality MaximizationChristian Schulz, Jakob Ternes, Henning Woydt
In the NP-hard \textsc{Group Closeness Centrality Maximization} problem, the input is a graph $G = (V,E)$ and a positive integer $k$, and the task is to find a set $S \subseteq V$ of size $k$ that maximizes the reciprocal of group farness $f(S) = \sum_{v \in V} \min_{s \in S} \text{dist}(v,s)$. A widely used greedy algorithm with previously unknown approximation guarantee may produce arbitrarily poor approximations. To efficiently obtain solutions with quality guarantees, known exact and approximation algorithms are revised. The state-of-the-art exact algorithm iteratively solves ILPs of increasing size until the ILP at hand can represent an optimal solution. In this work, we propose two new techniques to further improve the algorithm. The first technique reduces the size of the ILPs while the second technique aims to minimize the number of needed iterations. Our improvements yield a speedup by a factor of $3.6$ over the next best exact algorithm and can achieve speedups by up to a factor of $22.3$. Furthermore, we add reduction techniques to a $1/5$-approximation algorithm, and show that these adaptations do not compromise its approximation guarantee. The improved algorithm achieves mean speedups of $1.4$ and a maximum speedup of up to $2.9$ times.
7.4DCMar 13
GPU-Accelerated Algorithms for Process MappingPetr Samoldekin, Christian Schulz, Henning Woydt
Process mapping asks to assign vertices of a task graph to processing elements of a supercomputer such that the computational workload is balanced while the communication cost is minimized. Motivated by the recent success of GPU-based graph partitioners, we propose two GPU-accelerated algorithms for this optimization problem. The first algorithm employs hierarchical multisection, which partitions the task graph alongside the hierarchy of the supercomputer. The method utilizes GPU-based graph partitioners to accelerate the mapping process. The second algorithm integrates process mapping directly into the modern multilevel graph partitioning pipeline. Vital phases like coarsening and refinement are accelerated by exploiting the parallelism of GPUs. The first algorithm has, on average, about 12 percent higher communication costs than the state-of-the-art solver and thus remains competitive with it. However, in terms of speed, it vastly outperforms the competitor with a geometric mean speedup of 22 times and a maximum speedup of 934 times. The second approach is even faster, with a geometric mean speedup of 1454 times and a peak speedup of 12376 times. Compared to other algorithms that prioritize speed over solution quality, this approach has the same quality but much greater speedups. To our knowledge, these are the first GPU-based algorithms for process mapping.
LGFeb 8, 2025
CluStRE: Streaming Graph Clustering with Multi-Stage RefinementAdil Chhabra, Shai Dorian Peretz, Christian Schulz
We present CluStRE, a novel streaming graph clustering algorithm that balances computational efficiency with high-quality clustering using multi-stage refinement. Unlike traditional in-memory clustering approaches, CluStRE processes graphs in a streaming setting, significantly reducing memory overhead while leveraging re-streaming and evolutionary heuristics to improve solution quality. Our method dynamically constructs a quotient graph, enabling modularity-based optimization while efficiently handling large-scale graphs. We introduce multiple configurations of CluStRE to provide trade-offs between speed, memory consumption, and clustering quality. Experimental evaluations demonstrate that CluStRE improves solution quality by 89.8%, operates 2.6 times faster, and uses less than two-thirds of the memory required by the state-of-the-art streaming clustering algorithm on average. Moreover, our strongest mode enhances solution quality by up to 150% on average. With this, CluStRE achieves comparable solution quality to in-memory algorithms, i.e. over 96% of the quality of clustering approaches, including Louvain, effectively bridging the gap between streaming and traditional clustering methods.
CVJan 19, 2022
Weakly Supervised Semantic Segmentation of Remote Sensing Images for Tree Species Classification Based on Explanation MethodsSteve Ahlswede, Nimisha Thekke-Madam, Christian Schulz et al.
The collection of a high number of pixel-based labeled training samples for tree species identification is time consuming and costly in operational forestry applications. To address this problem, in this paper we investigate the effectiveness of explanation methods for deep neural networks in performing weakly supervised semantic segmentation using only image-level labels. Specifically, we consider four methods:i) class activation maps (CAM); ii) gradient-based CAM; iii) pixel correlation module; and iv) self-enhancing maps (SEM). We compare these methods with each other using both quantitative and qualitative measures of their segmentation accuracy, as well as their computational requirements. Experimental results obtained on an aerial image archive show that:i) considered explanation techniques are highly relevant for the identification of tree species with weak supervision; and ii) the SEM outperforms the other considered methods. The code for this paper is publicly available at https://git.tu-berlin.de/rsim/rs_wsss.
DSAug 12, 2020
Boosting Data Reduction for the Maximum Weight Independent Set Problem Using Increasing TransformationsAlexander Gellner, Sebastian Lamm, Christian Schulz et al.
Given a vertex-weighted graph, the maximum weight independent set problem asks for a pair-wise non-adjacent set of vertices such that the sum of their weights is maximum. The branch-and-reduce paradigm is the de facto standard approach to solve the problem to optimality in practice. In this paradigm, data reduction rules are applied to decrease the problem size. These data reduction rules ensure that given an optimum solution on the new (smaller) input, one can quickly construct an optimum solution on the original input. We introduce new generalized data reduction and transformation rules for the problem. A key feature of our work is that some transformation rules can increase the size of the input. Surprisingly, these so-called increasing transformations can simplify the problem and also open up the reduction space to yield even smaller irreducible graphs later throughout the algorithm. In experiments, our algorithm computes significantly smaller irreducible graphs on all except one instance, solves more instances to optimality than previously possible, is up to two orders of magnitude faster than the best state-of-the-art solver, and finds higher-quality solutions than heuristic solvers DynWVC and HILS on many instances. While the increasing transformations are only efficient enough for preprocessing at this time, we see this as a critical initial step towards a new branch-and-transform paradigm.
DSFeb 6, 2020
Multilevel Acyclic Hypergraph PartitioningMerten Popp, Sebastian Schlag, Christian Schulz et al.
A directed acyclic hypergraph is a generalized concept of a directed acyclic graph, where each hyperedge can contain an arbitrary number of tails and heads. Directed hypergraphs can be used to model data flow and execution dependencies in streaming applications. Thus, hypergraph partitioning algorithms can be used to obtain efficient parallelizations for multiprocessor architectures. However, an acyclicity constraint on the partition is necessary when mapping streaming applications to embedded multiprocessors due to resource restrictions on this type of hardware. The acyclic hypergraph partitioning problem is to partition the hypernodes of a directed acyclic hypergraph into a given number of blocks of roughly equal size such that the corresponding quotient graph is acyclic while minimizing an objective function on the partition. Here, we contribute the first n-level algorithm for the acyclic hypergraph partitioning problem. Our focus is on acyclic hypergraphs where hyperedges can have one head and arbitrary many tails. Based on this, we engineer a memetic algorithm to further reduce communication cost, as well as to improve scheduling makespan on embedded multiprocessor architectures. Experiments indicate that our algorithm outperforms previous algorithms that focus on the directed acyclic graph case which have previously been employed in the application domain. Moreover, our experiments indicate that using the directed hypergraph model for this type of application yields a significantly smaller makespan.
DSNov 30, 2019
Scalable Graph AlgorithmsChristian Schulz
Processing large complex networks recently attracted considerable interest. Complex graphs are useful in a wide range of applications from technological networks to biological systems like the human brain. Sometimes these networks are composed of billions of entities that give rise to emerging properties and structures. Analyzing these structures aids us in gaining new insights about our surroundings. As huge networks become abundant, there is a need for scalable algorithms to perform analysis. A prominent example is the PageRank algorithm, which is one of the measures used by web search engines such as Google to rank web pages displayed to the user. In order to find these patterns, massive amounts of data have to be acquired and processed. Designing and evaluating scalable graph algorithms to handle these data sets is a crucial task on the road to understanding the underlying systems. This habilitation thesis is a summary a broad spectrum of scalable graph algorithms that I developed over the last six years with many coauthors. In general, this research is based on four pillars: multilevel algorithms, practical kernelization, parallelization and memetic algorithms that are highly interconnected. Experiments conducted indicate that our algorithms find better solutions and/or are much more scalable than the previous state-of-the-art.
LGAug 20, 2018
Faster Support Vector MachinesSebastian Schlag, Matthias Schmitt, Christian Schulz
The time complexity of support vector machines (SVMs) prohibits training on huge data sets with millions of data points. Recently, multilevel approaches to train SVMs have been developed to allow for time-efficient training on huge data sets. While regular SVMs perform the entire training in one -- time consuming -- optimization step, multilevel SVMs first build a hierarchy of problems decreasing in size that resemble the original problem and then train an SVM model for each hierarchy level, benefiting from the solved models of previous levels. We present a faster multilevel support vector machine that uses a label propagation algorithm to construct the problem hierarchy. Extensive experiments indicate that our approach is up to orders of magnitude faster than the previous fastest algorithm while having comparable classification quality. For example, already one of our sequential solvers is on average a factor 15 faster than the parallel ThunderSVM algorithm, while having similar classification quality.
NEFeb 20, 2018
Memetic Graph ClusteringSonja Biedermann, Monika Henzinger, Christian Schulz et al.
It is common knowledge that there is no single best strategy for graph clustering, which justifies a plethora of existing approaches. In this paper, we present a general memetic algorithm, VieClus, to tackle the graph clustering problem. This algorithm can be adapted to optimize different objective functions. A key component of our contribution are natural recombine operators that employ ensemble clusterings as well as multi-level techniques. Lastly, we combine these techniques with a scalable communication protocol, producing a system that is able to compute high-quality solutions in a short amount of time. We instantiate our scheme with local search for modularity and show that our algorithm successfully improves or reproduces all entries of the 10th DIMACS implementation~challenge under consideration using a small amount of time.
DSSep 25, 2017
Evolutionary Acyclic Graph PartitioningOrlando Moreira, Merten Popp, Christian Schulz
Directed graphs are widely used to model data flow and execution dependencies in streaming applications. This enables the utilization of graph partitioning algorithms for the problem of parallelizing computation for multiprocessor architectures. However due to resource restrictions, an acyclicity constraint on the partition is necessary when mapping streaming applications to an embedded multiprocessor. Here, we contribute a multi-level algorithm for the acyclic graph partitioning problem. Based on this, we engineer an evolutionary algorithm to further reduce communication cost, as well as to improve load balancing and the scheduling makespan on embedded multiprocessor architectures.
DSApr 3, 2017
Graph Partitioning with Acyclicity ConstraintsOrlando Moreira, Merten Popp, Christian Schulz
Graphs are widely used to model execution dependencies in applications. In particular, the NP-complete problem of partitioning a graph under constraints receives enormous attention by researchers because of its applicability in multiprocessor scheduling. We identified the additional constraint of acyclic dependencies between blocks when mapping computer vision and imaging applications to a heterogeneous embedded multiprocessor. Existing algorithms and heuristics do not address this requirement and deliver results that are not applicable for our use-case. In this work, we show that this more constrained version of the graph partitioning problem is NP-complete and present heuristics that achieve a close approximation of the optimal solution found by an exhaustive search for small problem instances and much better scalability for larger instances. In addition, we can show a positive impact on the schedule of a real imaging application that improves communication volume and execution time.
NEFeb 6, 2017
Distributed Evolutionary k-way Node SeparatorsPeter Sanders, Christian Schulz, Darren Strash et al.
Computing high quality node separators in large graphs is necessary for a variety of applications, ranging from divide-and-conquer algorithms to VLSI design. In this work, we present a novel distributed evolutionary algorithm tackling the k-way node separator problem. A key component of our contribution includes new k-way local search algorithms based on maximum flows. We combine our local search with a multilevel approach to compute an initial population for our evolutionary algorithm, and further show how to modify the coarsening stage of our multilevel algorithm to create effective combine and mutation operations. Lastly, we combine these techniques with a scalable communication protocol, producing a system that is able to compute high quality solutions in a short amount of time. Our experiments against competing algorithms show that our advanced evolutionary algorithm computes the best result on 94% of the chosen benchmark instances.
DSSep 2, 2015
Finding Near-Optimal Independent Sets at ScaleSebastian Lamm, Peter Sanders, Christian Schulz et al.
The independent set problem is NP-hard and particularly difficult to solve in large sparse graphs. In this work, we develop an advanced evolutionary algorithm, which incorporates kernelization techniques to compute large independent sets in huge sparse networks. A recent exact algorithm has shown that large networks can be solved exactly by employing a branch-and-reduce technique that recursively kernelizes the graph and performs branching. However, one major drawback of their algorithm is that, for huge graphs, branching still can take exponential time. To avoid this problem, we recursively choose vertices that are likely to be in a large independent set (using an evolutionary approach), then further kernelize the graph. We show that identifying and removing vertices likely to be in large independent sets opens up the reduction space---which not only speeds up the computation of large independent sets drastically, but also enables us to compute high-quality independent sets on much larger instances than previously reported in the literature.
OCApr 29, 2015
Incorporating Road Networks into Territory DesignNitin Ahuja, Matthias Bender, Peter Sanders et al.
Given a set of basic areas, the territory design problem asks to create a predefined number of territories, each containing at least one basic area, such that an objective function is optimized. Desired properties of territories often include a reasonable balance, compact form, contiguity and small average journey times which are usually encoded in the objective function or formulated as constraints. We address the territory design problem by developing graph theoretic models that also consider the underlying road network. The derived graph models enable us to tackle the territory design problem by modifying graph partitioning algorithms and mixed integer programming formulations so that the objective of the planning problem is taken into account. We test and compare the algorithms on several real world instances.
DSFeb 5, 2015
Graph Partitioning for Independent SetsSebastian Lamm, Peter Sanders, Christian Schulz
Computing maximum independent sets in graphs is an important problem in computer science. In this paper, we develop an evolutionary algorithm to tackle the problem. The core innovations of the algorithm are very natural combine operations based on graph partitioning and local search algorithms. More precisely, we employ a state-of-the-art graph partitioner to derive operations that enable us to quickly exchange whole blocks of given independent sets. To enhance newly computed offsprings we combine our operators with a local search algorithm. Our experimental evaluation indicates that we are able to outperform state-of-the-art algorithms on a variety of instances.
DCApr 18, 2014
Parallel Graph Partitioning for Complex NetworksHenning Meyerhenke, Peter Sanders, Christian Schulz
Processing large complex networks like social networks or web graphs has recently attracted considerable interest. In order to do this in parallel, we need to partition them into pieces of about equal size. Unfortunately, previous parallel graph partitioners originally developed for more regular mesh-like networks do not work well for these networks. This paper addresses this problem by parallelizing and adapting the label propagation technique originally developed for graph clustering. By introducing size constraints, label propagation becomes applicable for both the coarsening and the refinement phase of multilevel graph partitioning. We obtain very high quality by applying a highly parallel evolutionary algorithm to the coarsened graph. The resulting system is both more scalable and achieves higher quality than state-of-the-art systems like ParMetis or PT-Scotch. For large complex networks the performance differences are very big. For example, our algorithm can partition a web graph with 3.3 billion edges in less than sixteen seconds using 512 cores of a high performance cluster while producing a high quality partition -- none of the competing systems can handle this graph on our system.
DSOct 1, 2012
Think Locally, Act Globally: Perfectly Balanced Graph PartitioningPeter Sanders, Christian Schulz
We present a novel local improvement scheme for the perfectly balanced graph partitioning problem. This scheme encodes local searches that are not restricted to a balance constraint into a model allowing us to find combinations of these searches maintaining balance by applying a negative cycle detection algorithm. We combine this technique with an algorithm to balance unbalanced solutions and integrate it into a parallel multi-level evolutionary algorithm, KaFFPaE, to tackle the problem. Overall, we obtain a system that is fast on the one hand and on the other hand is able to improve or reproduce most of the best known perfectly balanced partitioning results ever reported in the literature.