LGApr 10, 2023
CAFIN: Centrality Aware Fairness inducing IN-processing for Unsupervised Representation Learning on GraphsArvindh Arun, Aakash Aanegola, Amul Agrawal et al.
Unsupervised Representation Learning on graphs is gaining traction due to the increasing abundance of unlabelled network data and the compactness, richness, and usefulness of the representations generated. In this context, the need to consider fairness and bias constraints while generating the representations has been well-motivated and studied to some extent in prior works. One major limitation of most of the prior works in this setting is that they do not aim to address the bias generated due to connectivity patterns in the graphs, such as varied node centrality, which leads to a disproportionate performance across nodes. In our work, we aim to address this issue of mitigating bias due to inherent graph structure in an unsupervised setting. To this end, we propose CAFIN, a centrality-aware fairness-inducing framework that leverages the structural information of graphs to tune the representations generated by existing frameworks. We deploy it on GraphSAGE (a popular framework in this domain) and showcase its efficacy on two downstream tasks - Node Classification and Link Prediction. Empirically, CAFIN consistently reduces the performance disparity across popular datasets (varying from 18 to 80% reduction in performance disparity) from various domains while incurring only a minimal cost of fairness.
61.1CLApr 16
From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step ReasoningKiran Purohit, Ramasuri Narayanam, Soumyabrata Pal
Speculative decoding (SD) accelerates large language model inference by allowing a lightweight draft model to propose outputs that a stronger target model verifies. However, its token-centric nature allows erroneous steps to propagate. Prior approaches mitigate this using external reward models, but incur additional latency, computational overhead, and limit generalizability. We propose SpecGuard, a verification-aware speculative decoding framework that performs step-level verification using only model-internal signals. At each step, SpecGuard samples multiple draft candidates and selects the most consistent step, which is then validated using an ensemble of two lightweight model-internal signals: (i) an attention-based grounding score that measures attribution to the input and previously accepted steps, and (ii) a log-probability-based score that captures token-level confidence. These signals jointly determine whether a step is accepted or recomputed using the target, allocating compute selectively. Experiments across a range of reasoning benchmarks show that SpecGuard improves accuracy by 3.6% while reducing latency by ~11%, outperforming both SD and reward-guided SD.
LGJan 12, 2025
Tab-Shapley: Identifying Top-k Tabular Data Quality InsightsManisha Padala, Lokesh Nagalapatti, Atharv Tyagi et al.
We present an unsupervised method for aggregating anomalies in tabular datasets by identifying the top-k tabular data quality insights. Each insight consists of a set of anomalous attributes and the corresponding subsets of records that serve as evidence to the user. The process of identifying these insight blocks is challenging due to (i) the absence of labeled anomalies, (ii) the exponential size of the subset search space, and (iii) the complex dependencies among attributes, which obscure the true sources of anomalies. Simple frequency-based methods fail to capture these dependencies, leading to inaccurate results. To address this, we introduce Tab-Shapley, a cooperative game theory based framework that uses Shapley values to quantify the contribution of each attribute to the data's anomalous nature. While calculating Shapley values typically requires exponential time, we show that our game admits a closed-form solution, making the computation efficient. We validate the effectiveness of our approach through empirical analysis on real-world tabular datasets with ground-truth anomaly labels.
LGOct 26, 2024
Sparse Linear Bandits with Blocking ConstraintsAdit Jain, Soumyabrata Pal, Sunav Choudhary et al.
We investigate the high-dimensional sparse linear bandits problem in a data-poor regime where the time horizon is much smaller than the ambient dimension and number of arms. We study the setting under the additional blocking constraint where each unique arm can be pulled only once. The blocking constraint is motivated by practical applications in personalized content recommendation and identification of data points to improve annotation efficiency for complex learning tasks. With mild assumptions on the arms, our proposed online algorithm (BSLB) achieves a regret guarantee of $\widetilde{\mathsf{O}}((1+β_k)^2k^{\frac{2}{3}} \mathsf{T}^{\frac{2}{3}})$ where the parameter vector has an (unknown) relative tail $β_k$ -- the ratio of $\ell_1$ norm of the top-$k$ and remaining entries of the parameter vector. To this end, we show novel offline statistical guarantees of the lasso estimator for the linear model that is robust to the sparsity modeling assumption. Finally, we propose a meta-algorithm (C-BSLB) based on corralling that does not need knowledge of optimal sparsity parameter $k$ at minimal cost to regret. Our experiments on multiple real-world datasets demonstrate the validity of our algorithms and theoretical framework.
LGOct 23, 2021
Game of Gradients: Mitigating Irrelevant Clients in Federated LearningLokesh Nagalapatti, Ramasuri Narayanam
The paradigm of Federated learning (FL) deals with multiple clients participating in collaborative training of a machine learning model under the orchestration of a central server. In this setup, each client's data is private to itself and is not transferable to other clients or the server. Though FL paradigm has received significant interest recently from the research community, the problem of selecting the relevant clients w.r.t. the central server's learning objective is under-explored. We refer to these problems as Federated Relevant Client Selection (FRCS). Because the server doesn't have explicit control over the nature of data possessed by each client, the problem of selecting relevant clients is significantly complex in FL settings. In this paper, we resolve important and related FRCS problems viz., selecting clients with relevant data, detecting clients that possess data relevant to a particular target label, and rectifying corrupted data samples of individual clients. We follow a principled approach to address the above FRCS problems and develop a new federated learning method using the Shapley value concept from cooperative game theory. Towards this end, we propose a cooperative game involving the gradients shared by the clients. Using this game, we compute Shapley values of clients and then present Shapley value based Federated Averaging (S-FedAvg) algorithm that empowers the server to select relevant clients with high probability. S-FedAvg turns out to be critical in designing specific algorithms to address the FRCS problems. We finally conduct a thorough empirical analysis on image classification and speech recognition tasks to show the superior performance of S-FedAvg than the baselines in the context of supervised federated learning settings.
SIDec 11, 2019
Beyond Node Embedding: A Direct Unsupervised Edge Representation Framework for Homogeneous NetworksSambaran Bandyopadhyay, Anirban Biswas, M. N. Murty et al.
Network representation learning has traditionally been used to find lower dimensional vector representations of the nodes in a network. However, there are very important edge driven mining tasks of interest to the classical network analysis community, which have mostly been unexplored in the network embedding space. For applications such as link prediction in homogeneous networks, vector representation (i.e., embedding) of an edge is derived heuristically just by using simple aggregations of the embeddings of the end vertices of the edge. Clearly, this method of deriving edge embedding is suboptimal and there is a need for a dedicated unsupervised approach for embedding edges by leveraging edge properties of the network. Towards this end, we propose a novel concept of converting a network to its weighted line graph which is ideally suited to find the embedding of edges of the original network. We further derive a novel algorithm to embed the line graph, by introducing the concept of collective homophily. To the best of our knowledge, this is the first direct unsupervised approach for edge embedding in homogeneous information networks, without relying on the node embeddings. We validate the edge embeddings on three downstream edge mining tasks. Our proposed optimization framework for edge embedding also generates a set of node embeddings, which are not just the aggregation of edges. Further experimental analysis shows the connection of our framework to the concept of node centrality.
CYDec 11, 2017
Cogniculture: Towards a Better Human-Machine Co-evolutionRakesh R Pimplikar, Kushal Mukherjee, Gyana Parija et al.
Research in Artificial Intelligence is breaking technology barriers every day. New algorithms and high performance computing are making things possible which we could only have imagined earlier. Though the enhancements in AI are making life easier for human beings day by day, there is constant fear that AI based systems will pose a threat to humanity. People in AI community have diverse set of opinions regarding the pros and cons of AI mimicking human behavior. Instead of worrying about AI advancements, we propose a novel idea of cognitive agents, including both human and machines, living together in a complex adaptive ecosystem, collaborating on human computation for producing essential social goods while promoting sustenance, survival and evolution of the agents' life cycle. We highlight several research challenges and technology barriers in achieving this goal. We propose a governance mechanism around this ecosystem to ensure ethical behaviors of all cognitive agents. Along with a novel set of use-cases of Cogniculture, we discuss the road map ahead for this journey.
CLSep 1, 2016
All Fingers are not Equal: Intensity of References in Scientific ArticlesTanmoy Chakraborty, Ramasuri Narayanam
Research accomplishment is usually measured by considering all citations with equal importance, thus ignoring the wide variety of purposes an article is being cited for. Here, we posit that measuring the intensity of a reference is crucial not only to perceive better understanding of research endeavor, but also to improve the quality of citation-based applications. To this end, we collect a rich annotated dataset with references labeled by the intensity, and propose a novel graph-based semi-supervised model, GraLap to label the intensity of references. Experiments with AAN datasets show a significant improvement compared to the baselines to achieve the true labels of the references (46% better correlation). Finally, we provide four applications to demonstrate how the knowledge of reference intensity leads to design better real-world applications.