LGMay 27, 2022
Generalized Reductions: Making any Hierarchical Clustering Fair and Balanced with Low CostMarina Knittel, Max Springer, John P. Dickerson et al.
Clustering is a fundamental building block of modern statistical analysis pipelines. Fair clustering has seen much attention from the machine learning community in recent years. We are some of the first to study fairness in the context of hierarchical clustering, after the results of Ahmadian et al. from NeurIPS in 2020. We evaluate our results using Dasgupta's cost function, perhaps one of the most prevalent theoretical metrics for hierarchical clustering evaluation. Our work vastly improves the previous $O(n^{5/6}poly\log(n))$ fair approximation for cost to a near polylogarithmic $O(n^δpoly\log(n))$ fair approximation for any constant $δ\in(0,1)$. This result establishes a cost-fairness tradeoff and extends to broader fairness constraints than the previous work. We also show how to alter existing hierarchical clusterings to guarantee fairness and cluster balance across any level in the hierarchy.
MLMar 8, 2023
Optimal Sparse Recovery with Decision StumpsKiarash Banihashem, MohammadTaghi Hajiaghayi, Max Springer
Decision trees are widely used for their low computational cost, good predictive performance, and ability to assess the importance of features. Though often used in practice for feature selection, the theoretical guarantees of these methods are not well understood. We here obtain a tight finite sample bound for the feature selection problem in linear regression using single-depth decision trees. We examine the statistical properties of these "decision stumps" for the recovery of the $s$ active features from $p$ total features, where $s \ll p$. Our analysis provides tight sample performance guarantees on high-dimensional sparse systems which align with the finite sample bound of $O(s \log p)$ as obtained by Lasso, improving upon previous bounds for both the median and optimal splitting criteria. Our results extend to the non-linear regime as well as arbitrary sub-Gaussian distributions, demonstrating that tree based methods attain strong feature selection properties under a wide variety of settings and further shedding light on the success of these methods in practice. As a byproduct of our analysis, we show that we can provably guarantee recovery even when the number of active features $s$ is unknown. We further validate our theoretical results and proof methodology using computational experiments.
4.2GTMay 22
Beyond the Half-Approximation: Fair and Efficient Online Class MatchingSander Borst, Max Springer
Online bipartite matching, where agents are known in advance but items arrive sequentially and must be irrevocably assigned, is fundamental to problems ranging from ride-sharing to online advertising. When agents belong to classes such as demographic groups or geographic regions, fairness demands equitable treatment across these groups. Recent work introduced class envy-freeness (CEF), a natural extension of the classical fair division notion: an algorithm is $α$-CEF if each class receives value at least an $α$ fraction of what it could extract from any other class's bundle. However, all known algorithms achieving constant-factor CEF guarantees attain utilitarian social welfare (total matching value) of at most $\frac{1}{2}$ times the optimum, far below the $1-\frac{1}{e} \approx 0.632$ achievable without fairness constraints. We resolve the open question of whether fairness necessitates this efficiency loss, by introducing threshold-based algorithms parameterized by $γ\in [0,1]$ that equalize allocations across classes until threshold $γ$, then maximize efficiency. For divisible matching, this yields simultaneous $(1-e^{-γ})$-CEF and $(1 - \frac{e^{γ-1}}{γ+1})$-USW guarantees; for indivisible matching, $\fracγ{2}$-CEF with the same USW. Setting $γ> 0$ produces the first algorithms beating $\frac{1}{2}$-USW while maintaining constant CEF. We complement this with a novel upper bound construction, proving no non-wasteful $α$-CEF algorithm can exceed $\frac{1 +α- e^{α-1}}{1+α}$-USW and correcting prior bounds that were vacuous for $α< 0.58$. Our upper bound nearly matches our algorithms' performance, giving the first substantive characterization of the price of fairness in online class matching.
LGNov 21, 2023
Fair Polylog-Approximate Low-Cost Hierarchical ClusteringMarina Knittel, Max Springer, John Dickerson et al.
Research in fair machine learning, and particularly clustering, has been crucial in recent years given the many ethical controversies that modern intelligent systems have posed. Ahmadian et al. [2020] established the study of fairness in \textit{hierarchical} clustering, a stronger, more structured variant of its well-known flat counterpart, though their proposed algorithm that optimizes for Dasgupta's [2016] famous cost function was highly theoretical. Knittel et al. [2023] then proposed the first practical fair approximation for cost, however they were unable to break the polynomial-approximate barrier they posed as a hurdle of interest. We break this barrier, proposing the first truly polylogarithmic-approximate low-cost fair hierarchical clustering, thus greatly bridging the gap between the best fair and vanilla hierarchical clustering approximations.
LGOct 29, 2023
An Improved Relaxation for Oracle-Efficient Adversarial Contextual BanditsKiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin et al.
We present an oracle-efficient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an online adversary. Our algorithm has a regret bound of $O(T^{\frac{2}{3}}(K\log(|Π|))^{\frac{1}{3}})$ and makes at most $O(K)$ calls per round to an offline optimization oracle, where $K$ denotes the number of actions, $T$ denotes the number of rounds and $Π$ denotes the set of policies. This is the first result to improve the prior best bound of $O((TK)^{\frac{2}{3}}(\log(|Π|))^{\frac{1}{3}})$ as obtained by Syrgkanis et al. at NeurIPS 2016, and the first to match the original bound of Langford and Zhang at NeurIPS 2007 which was obtained for the stochastic case.
LGFeb 17
The Geometry of Alignment Collapse: When Fine-Tuning Breaks SafetyMax Springer, Chung Peng Lee, Blossom Metevier et al.
Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and developers have no adversarial intent. We show that the prevailing explanation, that fine-tuning updates should be orthogonal to safety-critical directions in high-dimensional parameter space, offers false reassurance: we show this orthogonality is structurally unstable and collapses under the dynamics of gradient descent. We then resolve this through a novel geometric analysis, proving that alignment concentrates in low-dimensional subspaces with sharp curvature, creating a brittle structure that first-order methods cannot detect or defend. While initial fine-tuning updates may indeed avoid these subspaces, the curvature of the fine-tuning loss generates second-order acceleration that systematically steers trajectories into alignment-sensitive regions. We formalize this mechanism through the Alignment Instability Condition, three geometric properties that, when jointly satisfied, lead to safety degradation. Our main result establishes a quartic scaling law: alignment loss grows with the fourth power of training time, governed by the sharpness of alignment geometry and the strength of curvature coupling between the fine-tuning task and safety-critical parameters. These results expose a structural blind spot in the current safety paradigm. The dominant approaches to safe fine-tuning address only the initial snapshot of a fundamentally dynamic problem. Alignment fragility is not a bug to be patched; it is an intrinsic geometric property of gradient descent on curved manifolds. Our results motivate the development of curvature-aware methods, and we hope will further enable a shift in alignment safety analysis from reactive red-teaming to predictive diagnostics for open-weight model deployment.
LGAug 13, 2025
SYNAPSE-G: Bridging Large Language Models and Graph Learning for Rare Event ClassificationSasan Tavakkol, Lin Chen, Max Springer et al.
Scarcity of labeled data, especially for rare events, hinders training effective machine learning models. This paper proposes SYNAPSE-G (Synthetic Augmentation for Positive Sampling via Expansion on Graphs), a novel pipeline leveraging Large Language Models (LLMs) to generate synthetic training data for rare event classification, addressing the cold-start problem. This synthetic data serve as seeds for semi-supervised label propagation on a similarity graph constructed between the seeds and a large unlabeled dataset. This identifies candidate positive examples, subsequently labeled by an oracle (human or LLM). The expanded dataset then trains/fine-tunes a classifier. We theoretically analyze how the quality (validity and diversity) of the synthetic data impacts the precision and recall of our method. Experiments on the imbalanced SST2 and MHS datasets demonstrate SYNAPSE-G's effectiveness in finding positive labels, outperforming baselines including nearest neighbor search.
LGApr 20, 2025
Less is More: Adaptive Coverage for Synthetic Training DataSasan Tavakkol, Max Springer, Mohammadhossein Bateni et al.
Synthetic training data generation with Large Language Models (LLMs) like Google's Gemma and OpenAI's GPT offer a promising solution to the challenge of obtaining large, labeled datasets for training classifiers. When rapid model deployment is critical, such as in classifying emerging social media trends or combating new forms of online abuse tied to current events, the ability to generate training data is invaluable. While prior research has examined the comparability of synthetic data to human-labeled data, this study introduces a novel sampling algorithm, based on the maximum coverage problem, to select a representative subset from a synthetically generated dataset. Our results demonstrate that training a classifier on this contextually sampled subset achieves superior performance compared to training on the entire dataset. This "less is more" approach not only improves model accuracy but also reduces the volume of data required, leading to potentially more efficient model fine-tuning.