DSMay 28
Explaining Rankings with Hidden Group BonusesAlvin Hong Yao Yan, Suraj Shetiya, Sujoy Bhore et al.
Determining a linear utility function that correlates with observed candidate rankings is a foundational problem with applications in domains such as admissions, hiring, and recommendation systems, e.g., [Storandt and Funke, AAAI'19, Zhang et al., KDD'23, Wang et al., ICDE'24 (best paper award), Chen and Wong, VLDB'24]. Traditionally, these models assume full visibility into the feature sets used to determine the utility score. However, real-world scenarios often involve sensitive attributes that are hidden or partially observed, yet may influence outcomes through additive bonuses designed to promote fairness, as in [Gale and Marian, ICDE'24]. Motivated by such practical concerns, we study a variant of the ranking explanation problem where sensitive features are unobserved but may influence candidate rankings through group-specific linear boosts. We present a formal framework for modeling this problem and develop an algorithmic solution that leverages constraint satisfaction and automated reasoning techniques to jointly infer the linear scoring parameters and latent group bonuses consistent with the observed rankings. We further show that determining a satisfying linear function with group-specific bonuses is \textsf{NP}-hard in general, but when the feature dimension and the number of groups are constant, the problem admits a polynomial-time solution. Our approach is the first to address this nuanced variant, which captures key real-world challenges in fair ranking and admission systems. We perform extensive experiments on both real-world and synthetic datasets, demonstrating that our method effectively recovers hidden bonus structures and provides faithful explanations of observed ranking outcomes.
CGMar 24
Dynamic Light Spanners in Doubling MetricsSujoy Bhore, Jonathan Conroy, Arnold Filtser
A $t$-spanner of a point set $X$ in a metric space $(\mathcal{X}, δ)$ is a graph $G$ with vertex set $P$ such that, for any pair of points $u,v \in X$, the distance between $u$ and $v$ in $G$ is at most $t$ times $δ(u,v)$. We study the problem of maintaining a spanner for a dynamic point set $X$ -- that is, when $X$ undergoes a sequence of insertions and deletions -- in a metric space of constant doubling dimension. For any constant $\varepsilon>0$, we maintain a $(1+\varepsilon)$-spanner of $P$ whose total weight remains within a constant factor of the weight of the minimum spanning tree of $X$. Each update (insertion or deletion) can be performed in $\operatorname{poly}(\log Φ)$ time, where $Φ$ denotes the aspect ratio of $X$. Prior to our work, no efficient dynamic algorithm for maintaining a light-weight spanner was known even for point sets in low-dimensional Euclidean space.
DSApr 15
Online TCP Acknowledgment under General DelaysSujoy Bhore, Michał Pawłowski, Seeun William Umboh
In a seminal work, Dooly, Goldman, and Scott (STOC 1998; JACM 2001) introduced the classic Online TCP Acknowledgment problem. In this problem, a sequence of $n$ packets arrives over time, and the objective is to minimize both the number of acknowledgments sent and the total delay experienced by the packets. They showed that a greedy algorithm -- acknowledge when the delay of pending packets equals the acknowledgment cost -- is $2$-competitive. Online TCP Acknowledgment is the canonical online problem with delay, capturing the fundamental tradeoff between reducing service cost through batching and the delay incurred by pending requests. Prior work has largely focused on different types of service costs, e.g., Joint Replenishment. However, to the best of our knowledge, beyond the work of Albers and Bals (SODA 2003), which studies maximum delay and closely related objectives, not much is known for more general delay cost models. In this work, we study the Online TCP Acknowledgment under two new generalized delay costs that we call batch-oblivious and batch-aware. In the former, the delay cost is a function of the vector of packet delays. We show that the classic greedy approach is also $2$-competitive for continuous submodular functions and $\ell_p$ norms. In the batch-aware model, each batch incurs a delay cost that is a function $f$ of the vector of delays incurred by its packets. When the overall delay cost is the maximum of the batch delay costs, we show that the greedy approach is also $2$-competitive. Our main technical contribution is for the batch-aware setting, where the overall delay cost is the sum of the batch delay costs. We show that the greedy approach is $Ω(n)$-competitive and that the deterministic competitive ratio is $Θ(\log n)$. Remarkably, our algorithm only requires the bare minimum assumption that $f$ is monotone.
CCMar 24
Non-Clashing Teaching in Graphs: Algorithms, Complexity, and BoundsSujoy Bhore, Liana Khazaliya, Fionn Mc Inerney
Kirkpatrick et al. [ALT 2019] and Fallat et al. [JMLR 2023] introduced non-clashing teaching and proved that it is the most efficient batch machine teaching model satisfying the collusion-avoidance benchmark established in the seminal work of Goldman and Mathias [COLT 1993]. Recently, (positive) non-clashing teaching was thoroughly studied for balls in graphs, yielding numerous algorithmic and combinatorial results. In particular, Chalopin et al. [COLT 2024] and Ganian et al. [ICLR 2025] gave an almost complete picture of the complexity landscape of the positive variant, showing that it is tractable only for restricted graph classes due to the non-trivial nature of the problem and concept class. In this work, we consider (positive) non-clashing teaching for closed neighborhoods in graphs. This concept class is not only extensively studied in various related contexts, but it also exhibits broad generality, as any finite binary concept class can be equivalently represented by a set of closed neighborhoods in a graph. In comparison to the works on balls in graphs, we provide improved algorithmic results, notably including FPT algorithms for more general classes of parameters, and we complement these results by deriving stronger lower bounds. Lastly, we obtain combinatorial upper bounds for wider classes of graphs.
DSDec 30, 2025
Kidney Exchange: Faster Parameterized Algorithms and Tighter Lower BoundsAritra Banik, Sujoy Bhore, Palash Dey et al.
The kidney exchange mechanism allows many patient-donor pairs who are otherwise incompatible with each other to come together and exchange kidneys along a cycle. However, due to infrastructure and legal constraints, kidney exchange can only be performed in small cycles in practice. In reality, there are also some altruistic donors who do not have any paired patients. This allows us to also perform kidney exchange along paths that start from some altruistic donor. Unfortunately, the computational task is NP-complete. To overcome this computational barrier, an important line of research focuses on designing faster algorithms, both exact and using the framework of parameterized complexity. The standard parameter for the kidney exchange problem is the number $t$ of patients that receive a healthy kidney. The current fastest known deterministic FPT algorithm for this problem, parameterized by $t$, is $O^\star\left(14^t\right)$. In this work, we improve this by presenting a deterministic FPT algorithm that runs in time $O^\star\left((4e)^t\right)\approx O^\star\left(10.88^t\right)$. This problem is also known to be W[1]-hard parameterized by the treewidth of the underlying undirected graph. A natural question here is whether the kidney exchange problem admits an FPT algorithm parameterized by the pathwidth of the underlying undirected graph. We answer this negatively in this paper by proving that this problem is W[1]-hard parameterized by the pathwidth of the underlying undirected graph. We also present some parameterized intractability results improving the current understanding of the problem under the framework of parameterized complexity.
DSMar 15
Improved Online Hitting Set Algorithms for Structured and Geometric Set SystemsSujoy Bhore, Anupam Gupta, Amit Kumar
In the online hitting set problem, sets arrive over time, and the algorithm has to maintain a subset of elements that hit all the sets seen so far. Alon, Awerbuch, Azar, Buchbinder, and Naor (SICOMP 2009) gave an algorithm with competitive ratio $O(\log n \log m)$ for the (general) online hitting set and set cover problems for $m$ sets and $n$ elements; this is known to be tight for efficient online algorithms. Given this barrier for general set systems, we ask: can we break this double-logarithmic phenomenon for online hitting set/set cover on structured and geometric set systems? We provide an $O(\log n \log\log n)$-competitive algorithm for the weighted online hitting set problem on set systems with linear shallow-cell complexity, replacing the double-logarithmic factor in the general result by effectively a single logarithmic term. As a consequence of our results we obtain the first bounds for weighted online hitting set for natural geometric set families, thereby answering open questions regarding the gap between general and geometric weighted online hitting set problems.
MLMay 10
Optimal Regret for Single Index BanditsDevdan Dey, Sujoy Bhore, Avishek Ghosh
We study the $\textit{single-index bandit}$ problem, where rewards depend on an unknown one-dimensional projection of high-dimensional contexts through an unknown reward function. This model extends linear and generalized linear bandits to a nonparametric setting, and is particularly relevant when the reward function is not known in advance. While optimal regret guarantees are known for monotone reward functions, the general non-monotone case remains poorly understood, with the best known bound being $\tilde{\mathcal{O}}(T^{3/4})$ (under standard boundedness and Lipschitz assumptions on the reward function [Kang et al., 2025]). We close this gap by establishing the optimal regret for general single-index bandits. We propose a simple two-phase algorithm, namely, Zoomed Single Index Bandit with Upper Confidence Bound ($\texttt{ZoomSIB-UCB}$), that first estimates the projection direction via a normalized Stein estimator, and then reduces the problem to a one-dimensional bandit using discretization and finally use UCB. This approach achieves a regret of $\tilde{\mathcal{O}}(T^{2/3})$, and improves significantly upon prior work without any additional assumptions. We also prove a matching minimax lower bound of $\tildeΩ(T^{2/3})$, showing that the upper bound is essentially tight. Our upper and lower bounds together provide a sharp characterization of the regret in single-index bandits. Moreover, the empirical results further demonstrate the effectiveness and robustness of our approach.
CGMay 5
Visibility Queries in Simple PolygonsSujoy Bhore, Chih-Hung Liu, Anurag Murty Naredla et al.
Given a simple polygon $P$ with $n$ vertices, we consider the problem of constructing a data structure for visibility queries: for any query point $q \in P$, compute the visibility polygon of $q$ in $P$. To obtain $O(\log n + k)$ query time, where $k$ is the size of the visibility polygon of $q$, the previous best result requires $O(n^3)$ space. In this paper, we propose a new data structure that uses $O(n^{2+ε})$ space, for any $ε> 0$, while achieving the same query time. If only $O(n^2)$ space is available, the best known result provides $O(\log^2 n + k)$ query time. We improve this to $O(\log n \log \log n + k)$ time. When restricted to $o(n^2)$ space, the only previously known approach, aside from the $O(n)$-time algorithm that computes the visibility polygon without preprocessing, is an $O(n)$-space data structure that supports $O(k \log n)$-time queries. We construct a data structure using $O(n \log n)$ space that answers visibility queries in $O(n^{1/2+ε} + k)$ time. In addition, for the special case in which $q$ lies on the boundary of $P$, we build a data structure of $O(n \log n)$ space supporting $O(\log^2 n + k)$ query time; alternatively, we achieve $O(\log n + k)$ query time using $O(n^{1+ε})$ space. To achieve our results, we propose a new method for decomposing simple polygons, which may be of independent interest.
DSApr 5
DAG Covers: The Steiner Point EffectSujoy Bhore, Hsien-Chih Chang, Jonathan Conroy et al.
Given a weighted digraph $G$, a $(t,g,μ)$-DAG cover is a collection of $g$ dominating DAGs $D_1,\dots,D_g$ such that all distances are approximately preserved: for every pair $(u,v)$ of vertices, $\min_id_{D_i}(u,v)\le t\cdot d_{G}(u,v)$, and the total number of non-$G$ edges is bounded by $|(\cup_i D_i)\setminus G|\le μ$. Assadi, Hoppenworth, and Wein [STOC 25] and Filtser [SODA 26] studied DAG covers for general digraphs. This paper initiates the study of \emph{Steiner} DAG cover, where the DAGs are allowed to contain Steiner points. We obtain Steiner DAG covers on the important classes of planar digraphs and low-treewidth digraphs. Specifically, we show that any digraph with treewidth tw admits a $(1,2,\tilde{O}(n\cdot tw))$-Steiner DAG cover. For planar digraphs we provide a $(1+\varepsilon,2,\tilde{O}_\varepsilon(n))$-Steiner DAG cover. We also demonstrate a stark difference between Steiner and non-Steiner DAG covers. As a lower bound, we show that any non-Steiner DAG cover for graphs with treewidth $1$ with stretch $t<2$ and sub-quadratic number of extra edges requires $Ω(\log n)$ DAGs.
LGOct 27, 2025
Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density EstimationVed Danait, Srijan Das, Sujoy Bhore
Approximate Nearest Neighbor (ANN) search and Approximate Kernel Density Estimation (A-KDE) are fundamental problems at the core of modern machine learning, with broad applications in data analysis, information systems, and large-scale decision making. In massive and dynamic data streams, a central challenge is to design compact sketches that preserve essential structural properties of the data while enabling efficient queries. In this work, we develop new sketching algorithms that achieve sublinear space and query time guarantees for both ANN and A-KDE for a dynamic stream of data. For ANN in the streaming model, under natural assumptions, we design a sublinear sketch that requires only $\mathcal{O}(n^{1+ρ-η})$ memory by storing only a sublinear ($n^{-η}$) fraction of the total inputs, where $ρ$ is a parameter of the LSH family, and $0<η<1$. Our method supports sublinear query time, batch queries, and extends to the more general Turnstile model. While earlier works have focused on Exact NN, this is the first result on ANN that achieves near-optimal trade-offs between memory size and approximation error. Next, for A-KDE in the Sliding-Window model, we propose a sketch of size $\mathcal{O}\left(RW \cdot \frac{1}{\sqrt{1+ε} - 1} \log^2 N\right)$, where $R$ is the number of sketch rows, $W$ is the LSH range, $N$ is the window size, and $ε$ is the approximation error. This, to the best of our knowledge, is the first theoretical sublinear sketch guarantee for A-KDE in the Sliding-Window model. We complement our theoretical results with experiments on various real-world datasets, which show that the proposed sketches are lightweight and achieve consistently low error in practice.
CGSep 9, 2021
Worbel: Aggregating Point Labels into Word CloudsSujoy Bhore, Robert Ganian, Guangping Li et al.
Point feature labeling is a classical problem in cartography and GIS that has been extensively studied for geospatial point data. At the same time, word clouds are a popular visualization tool to show the most important words in text data which has also been extended to visualize geospatial data (Buchin et al. PacificVis 2016). In this paper, we study a hybrid visualization, which combines aspects of word clouds and point labeling. In the considered setting, the input data consists of a set of points grouped into categories and our aim is to place multiple disjoint and axis-aligned rectangles, each representing a category, such that they cover points of (mostly) the same category under some natural quality constraints. In our visualization, we then place category names inside the computed rectangles to produce a labeling of the covered points which summarizes the predominant categories globally (in a word-cloud-like fashion) while locally avoiding excessive misrepresentation of points (i.e., retaining the precision of point labeling). We show that computing a minimum set of such rectangles is NP-hard. Hence, we turn our attention to developing heuristics and exact SAT models to compute our visualizations. We evaluate our algorithms quantitatively, measuring running time and quality of the produced solutions, on several artificial and real-world data sets. Our experiments show that the heuristics produce solutions of comparable quality to the SAT models while running much faster.