Hendrik Fichtenberger

DS
h-index36
7papers
87citations
Novelty51%
AI Score43

7 Papers

LGJul 26, 2023
HUGE: Huge Unsupervised Graph Embeddings with TPUs

Brandon Mayer, Anton Tsitsulin, Hendrik Fichtenberger et al.

Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of nodes and trillions of edges. A common first step for network understanding is Graph Embedding, the process of creating a continuous representation of nodes in a graph. A continuous representation is often more amenable, especially at scale, for solving downstream machine learning tasks such as classification, link prediction, and clustering. A high-performance graph embedding architecture leveraging Tensor Processing Units (TPUs) with configurable amounts of high-bandwidth memory is presented that simplifies the graph embedding problem and can scale to graphs with billions of nodes and trillions of edges. We verify the embedding space quality on real and synthetic large-scale datasets.

DSDec 22, 2025
Clustering with Label Consistency

Diptarka Chakraborty, Hendrik Fichtenberger, Bernhard Haeupler et al.

Designing efficient, effective, and consistent metric clustering algorithms is a significant challenge attracting growing attention. Traditional approaches focus on the stability of cluster centers; unfortunately, this neglects the real-world need for stable point labels, i.e., stable assignments of points to named sets (clusters). In this paper, we address this gap by initiating the study of label-consistent metric clustering. We first introduce a new notion of consistency, measuring the label distance between two consecutive solutions. Then, armed with this new definition, we design new consistent approximation algorithms for the classical $k$-center and $k$-median problems.

DSMar 21, 2024
A Differentially Private Clustering Algorithm for Well-Clustered Graphs

Weiqiang He, Hendrik Fichtenberger, Pan Peng

We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient ($ε$,$δ$)-DP algorithm tailored specifically for such graphs. Our algorithm draws inspiration from the recent work of Chen et al., who developed DP algorithms for recovery of stochastic block models in cases where the graph comprises exactly two nearly-balanced clusters. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters, and the misclassification ratio almost matches the one of the best-known non-private algorithms. We conduct experimental evaluations on datasets with known ground truth clusters to substantiate the prowess of our algorithm. We also show that any (pure) $ε$-DP algorithm would result in substantial error.

DCJul 14, 2025
Large-Scale Graph Building in Dynamic Environments: Low Latency and High Quality

Filipe Miguel Gonçalves de Almeida, CJ Carey, Hendrik Fichtenberger et al.

Learning and constructing large-scale graphs has attracted attention in recent decades, resulting in a rich literature that introduced various systems, tools, and algorithms. Grale is one of such tools that is designed for offline environments and is deployed in more than 50 different industrial settings at Google. Grale is widely applicable because of its ability to efficiently learn and construct a graph on datasets with multiple types of features. However, it is often the case that applications require the underlying data to evolve continuously and rapidly and the updated graph needs to be available with low latency. Such setting make the use of Grale prohibitive. While there are Approximate Nearest Neighbor (ANN) systems that handle dynamic updates with low latency, they are mostly limited to similarities over a single embedding. In this work, we introduce a system that inherits the advantages and the quality of Grale, and maintains a graph construction in a dynamic setting with tens of milliseconds of latency per request. We call the system Dynamic Grale Using ScaNN (Dynamic GUS). Our system has a wide range of applications with over 10 deployments at Google. One of the applications is in Android Security and Privacy, where Dynamic Grale Using ScaNN enables capturing harmful applications 4 times faster, before they can reach users.

DSFeb 23, 2022
Constant matters: Fine-grained Complexity of Differentially Private Continual Observation

Hendrik Fichtenberger, Monika Henzinger, Jalaj Upadhyay

We study fine-grained error bounds for differentially private algorithms for counting under continual observation. Our main insight is that the matrix mechanism when using lower-triangular matrices can be used in the continual observation model. More specifically, we give an explicit factorization for the counting matrix $M_\mathsf{count}$ and upper bound the error explicitly. We also give a fine-grained analysis, specifying the exact constant in the upper bound. Our analysis is based on upper and lower bounds of the {\em completely bounded norm} (cb-norm) of $M_\mathsf{count}$. Along the way, we improve the best-known bound of 28 years by Mathias (SIAM Journal on Matrix Analysis and Applications, 1993) on the cb-norm of $M_\mathsf{count}$ for a large range of the dimension of $M_\mathsf{count}$. Furthermore, we are the first to give concrete error bounds for various problems under continual observation such as binary counting, maintaining a histogram, releasing an approximately cut-preserving synthetic graph, many graph-based statistics, and substring and episode counting. Finally, we note that our result can be used to get a fine-grained error bound for non-interactive local learning {and the first lower bounds on the additive error for $(ε,δ)$-differentially-private counting under continual observation.} Subsequent to this work, Henzinger et al. (SODA2023) showed that our factorization also achieves fine-grained mean-squared error.

DSJun 28, 2021
Differentially Private Algorithms for Graphs Under Continual Observation

Hendrik Fichtenberger, Monika Henzinger, Lara Ost

Differentially private algorithms protect individuals in data analysis scenarios by ensuring that there is only a weak correlation between the existence of the user in the data and the result of the analysis. Dynamic graph algorithms maintain the solution to a problem (e.g., a matching) on an evolving input, i.e., a graph where nodes or edges are inserted or deleted over time. They output the value of the solution after each update operation, i.e., continuously. We study (event-level and user-level) differentially private algorithms for graph problems under continual observation, i.e., differentially private dynamic graph algorithms. We present event-level private algorithms for partially dynamic counting-based problems such as triangle count that improve the additive error by a polynomial factor (in the length $T$ of the update sequence) on the state of the art, resulting in the first algorithms with additive error polylogarithmic in $T$. We also give $\varepsilon$-differentially private and partially dynamic algorithms for minimum spanning tree, minimum cut, densest subgraph, and maximum matching. The additive error of our improved MST algorithm is $O(W \log^{3/2}T / \varepsilon)$, where $W$ is the maximum weight of any edge, which, as we show, is tight up to a $(\sqrt{\log T} / \varepsilon)$-factor. For the other problems, we present a partially-dynamic algorithm with multiplicative error $(1+β)$ for any constant $β> 0$ and additive error $O(W \log(nW) \log(T) / (\varepsilon β))$. Finally, we show that the additive error for a broad class of dynamic graph algorithms with user-level privacy must be linear in the value of the output solution's range.

LGOct 11, 2018
A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

Hendrik Fichtenberger, Dennis Rohde

In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. We study property testing of $k$-NN graphs in theory and evaluate it empirically: given a point set $P \subset \mathbb{R}^δ$ and a directed graph $G=(P,E)$, is $G$ a $k$-NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $ε$-far from being a $k$-NN graph? Here, $ε$-far means that one has to change more than an $ε$-fraction of the edges in order to make $G$ a $k$-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the $k$-NN property, with complexity $O(\sqrt{n} k^2 / ε^2)$ measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of $Ω(\sqrt{n / εk})$. We evaluate our tester empirically on the $k$-NN models computed by various algorithms and show that it can be used to detect $k$-NN models with bad accuracy in significantly less time than the building time of the $k$-NN model.