Manjesh Kumar Hanawal

LG
h-index16
17papers
204citations
Novelty50%
AI Score56

17 Papers

CVFeb 11Code
FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference

Divya Jyoti Bajpai, Dhruv Bhardwaj, Soumya Roy et al.

Flow-matching models deliver state-of-the-art fidelity in image and video generation, but the inherent sequential denoising process renders them slower. Existing acceleration methods like distillation, trajectory truncation, and consistency approaches are static, require retraining, and often fail to generalize across tasks. We propose FastFlow, a plug-and-play adaptive inference framework that accelerates generation in flow matching models. FastFlow identifies denoising steps that produce only minor adjustments to the denoising path and approximates them without using the full neural network models used for velocity predictions. The approximation utilizes finite-difference velocity estimates from prior predictions to efficiently extrapolate future states, enabling faster advancements along the denoising path at zero compute cost. This enables skipping computation at intermediary steps. We model the decision of how many steps to safely skip before requiring a full model computation as a multi-armed bandit problem. The bandit learns the optimal skips to balance speed with performance. FastFlow integrates seamlessly with existing pipelines and generalizes across image generation, video generation, and editing tasks. Experiments demonstrate a speedup of over 2.6x while maintaining high-quality outputs. The source code for this work can be found at https://github.com/Div290/FastFlow.

LGApr 30, 2023
ICQ: A Quantization Scheme for Best-Arm Identification Over Bit-Constrained Channels

Fathima Zarin Faizal, Adway Girish, Manjesh Kumar Hanawal et al.

We study the problem of best-arm identification in a distributed variant of the multi-armed bandit setting, with a central learner and multiple agents. Each agent is associated with an arm of the bandit, generating stochastic rewards following an unknown distribution. Further, each agent can communicate the observed rewards with the learner over a bit-constrained channel. We propose a novel quantization scheme called Inflating Confidence for Quantization (ICQ) that can be applied to existing confidence-bound based learning algorithms such as Successive Elimination. We analyze the performance of ICQ applied to Successive Elimination and show that the overall algorithm, named ICQ-SE, has the order-optimal sample complexity as that of the (unquantized) SE algorithm. Moreover, it requires only an exponentially sparse frequency of communication between the learner and the agents, thus requiring considerably fewer bits than existing quantization schemes to successfully identify the best arm. We validate the performance improvement offered by ICQ with other quantization methods through numerical experiments.

LGMar 2
Stochastic Multi-Armed Bandits with Limited Control Variates

Arun Verma, Manjesh Kumar Hanawal, Arun Rajkumar

Motivated by wireless networks where interference or channel state estimates provide partial insight into throughput, we study a variant of the classical stochastic multi-armed bandit problem in which the learner has limited access to auxiliary information. Recent work has shown that such auxiliary information, when available as control variates, can be used to get tighter confidence bounds, leading to lower regret. However, existing works assume that control variates are available in every round, which may not be realistic in several real-life scenarios. To address this, we propose UCB-LCV, an upper confidence bound (UCB) based algorithm that effectively combines the estimators obtained from rewards and control variates. When there is no control variate, UCB-LCV leads to a novel algorithm that we call UCB-NORMAL, outperforming its existing algorithms for the standard MAB setting with normally distributed rewards. Finally, we discuss variants of the proposed UCB-LCV that apply to general distributions and experimentally demonstrate that UCB-LCV outperforms existing bandit algorithms.

LGFeb 2, 2025Code
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

Early Exit (EE) techniques have emerged as a means to reduce inference latency in Deep Neural Networks (DNNs). The latency improvement and accuracy in these techniques crucially depend on the criteria used to make exit decisions. We propose a new decision criterion where exit classifiers are treated as experts BEEM and aggregate their confidence scores. The confidence scores are aggregated only if neighbouring experts are consistent in prediction as the samples pass through them, thus capturing their ensemble effect. A sample exits when the aggregated confidence value exceeds a threshold. The threshold is set using the error rates of the intermediate exits aiming to surpass the performance of conventional DNN inference. Experimental results on the COCO dataset for Image captioning and GLUE datasets for various language tasks demonstrate that our method enhances the performance of state-of-the-art EE methods, achieving improvements in speed-up by a factor 1.5x to 2.1x. When compared to the final layer, its accuracy is comparable in harder Image Captioning and improves in the easier language tasks. The source code for this work is publicly available at https://github.com/Div290/BEEM1/tree/main

LGSep 28, 2025Code
Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

Early-Exit Deep Neural Networks enable adaptive inference by allowing prediction at intermediary layers, significantly reducing computational costs and latency. Most of the early exit strategies greedily exit a sample at an intermediary layer if the confidence in class prediction exceeds a predefined threshold that is set using a static validation set. This is problematic as the model might be overconfident in a wrong class. Also, they are not robust to distribution shifts encountered in deployment, which can undermine model trustworthiness and accuracy. To address these challenges, we propose UAT that adapts the threshold for exit decisions using a Multi-Armed Bandit framework, enabling online, unsupervised adjustment of exit decisions. UAT makes decisions based on a new reward function that assesses predictive certainty and its reliability to balance computational efficiency and prediction quality while penalizing unnecessary late exits. We provide guarantees on risk achieved by UAT and validate its performance on diverse tasks spanning vision-language understanding, text generation, and classification. Our framework demonstrates consistent improvements in speedup (1.70-2.10x) with a minimal performance drop (<2%) as compared to full model performance. Our source code is available at https://github.com/Div290/UAT.

LGSep 15, 2025Code
Know What You Don't Know: Selective Prediction for Early Exit DNNs

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

Inference latency and trustworthiness of Deep Neural Networks (DNNs) are the bottlenecks in deploying them in critical applications like sensitive tasks. Early Exit (EE) DNNs overcome the latency issues by allowing samples to exit from intermediary layers if they attain `high' confidence scores on the predicted class. However, the DNNs are known to exhibit overconfidence, which can lead to many samples exiting early and render EE strategies untrustworthy. We use Selective Prediction (SP) to overcome this issue by checking the `hardness' of the samples rather than just relying on the confidence score alone. We propose SPEED, a novel approach that uses Deferral Classifiers (DCs) at each layer to check the hardness of samples before performing EEs. Specifically, the DCs identify if a sample is hard to predict at an intermediary layer, leading to hallucination, and defer it to an expert. Early detection of hard samples for inference prevents the wastage of computational resources and improves trust by deferring the hard samples to the expert. We demonstrate that EE aided with SP improves both accuracy and latency. Our method minimizes the risk of wrong prediction by $50\%$ with a speedup of $2.05\times$ as compared to the final layer. The anonymized source code is available at https://github.com/Div290/SPEED

DCDec 21, 2024Code
Distributed Inference on Mobile Edge and Cloud: A Data-Cartography based Clustering Approach

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

The large size of DNNs poses a significant challenge for deployment on devices with limited resources, such as mobile, edge, and IoT platforms. To address this issue, a distributed inference framework can be utilized. In this framework, a small-scale DNN (initial layers) is deployed on mobile devices, a larger version on edge devices, and the full DNN on the cloud. Samples with low complexity (easy) can be processed on mobile, those with moderate complexity (medium) on edge devices, and high complexity (hard) samples on the cloud. Given that the complexity of each sample is unknown in advance, the crucial question in distributed inference is determining the sample complexity for appropriate DNN processing. We introduce a novel method named \our{}, which leverages the Data Cartography approach initially proposed for enhancing DNN generalization. By employing data cartography, we assess sample complexity. \our{} aims to boost accuracy while considering the offloading costs from mobile to edge/cloud. Our experimental results on GLUE datasets, covering a variety of NLP tasks, indicate that our approach significantly lowers inference costs by more than 43\% while maintaining a minimal accuracy drop of less than 0.5\% compared to performing all inferences on the cloud. The source code is available at https://anonymous.4open.science/r/DIMEC-1B04.

CLApr 11, 2021Code
Unsupervised Learning of Explainable Parse Trees for Improved Generalisation

Atul Sahay, Ayush Maheshwari, Ritesh Kumar et al.

Recursive neural networks (RvNN) have been shown useful for learning sentence representations and helped achieve competitive performance on several natural language inference tasks. However, recent RvNN-based models fail to learn simple grammar and meaningful semantics in their intermediate tree representation. In this work, we propose an attention mechanism over Tree-LSTMs to learn more meaningful and explainable parse tree structures. We also demonstrate the superior performance of our proposed model on natural language inference, semantic relatedness, and sentiment analysis tasks and compare them with other state-of-the-art RvNN based methods. Further, we present a detailed qualitative and quantitative analysis of the learned parse trees and show that the discovered linguistic structures are more explainable, semantically meaningful, and grammatically correct than recent approaches. The source code of the paper is available at https://github.com/atul04/Explainable-Latent-Structures-Using-Attention.

CLMay 23, 2024
CEEBERT: Cross-Domain Inference in Early Exit BERT

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

Pre-trained Language Models (PLMs), like BERT, with self-supervision objectives exhibit remarkable performance and generalization across various tasks. However, they suffer in inference latency due to their large size. To address this issue, side branches are attached at intermediate layers, enabling early inference of samples without requiring them to pass through all layers. However, the challenge is to decide which layer to infer and exit each sample so that the accuracy and latency are balanced. Moreover, the distribution of the samples to be inferred may differ from that used for training necessitating cross-domain adaptation. We propose an online learning algorithm named Cross-Domain Inference in Early Exit BERT (CeeBERT) that dynamically determines early exits of samples based on the level of confidence at each exit point. CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly, eliminating the need for labeled data. Experimental results on five distinct datasets with BERT and ALBERT models demonstrate CeeBERT's ability to improve latency by reducing unnecessary computations with minimal drop in performance. By adapting to the threshold values, CeeBERT can speed up the BERT/ALBERT models by $2\times$ - $3.5\times$ with minimal drop in accuracy.

LGJan 13, 2025
A Survey of Early Exit Deep Neural Networks in NLP

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

Deep Neural Networks (DNNs) have grown increasingly large in size to achieve state of the art performance across a wide range of tasks. However, their high computational requirements make them less suitable for resource-constrained applications. Also, real-world datasets often consist of a mixture of easy and complex samples, necessitating adaptive inference mechanisms that account for sample difficulty. Early exit strategies offer a promising solution by enabling adaptive inference, where simpler samples are classified using the initial layers of the DNN, thereby accelerating the overall inference process. By attaching classifiers at different layers, early exit methods not only reduce inference latency but also improve the model robustness against adversarial attacks. This paper presents a comprehensive survey of early exit methods and their applications in NLP.

LGFeb 23, 2024
FAIR: Filtering of Automatically Induced Rules

Divya Jyoti Bajpai, Ayush Maheshwari, Manjesh Kumar Hanawal et al.

The availability of large annotated data can be a critical bottleneck in training machine learning algorithms successfully, especially when applied to diverse domains. Weak supervision offers a promising alternative by accelerating the creation of labeled training data using domain-specific rules. However, it requires users to write a diverse set of high-quality rules to assign labels to the unlabeled data. Automatic Rule Induction (ARI) approaches circumvent this problem by automatically creating rules from features on a small labeled set and filtering a final set of rules from them. In the ARI approach, the crucial step is to filter out a set of a high-quality useful subset of rules from the large set of automatically created rules. In this paper, we propose an algorithm (Filtering of Automatically Induced Rules) to filter rules from a large number of automatically induced rules using submodular objective functions that account for the collective precision, coverage, and conflicts of the rule set. We experiment with three ARI approaches and five text classification datasets to validate the superior performance of our algorithm with respect to several semi-supervised label aggregation approaches. Further, we show that achieves statistically significant results in comparison to existing rule-filtering approaches.

CVFeb 1
FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching

Divya Jyoti Bajpai, Shubham Agarwal, Apoorv Saxena et al.

Flow Matching (FM) has recently emerged as a powerful approach for high-quality visual generation. However, their prohibitively slow inference due to a large number of denoising steps limits their potential use in real-time or interactive applications. Existing acceleration methods, like distillation, truncation, or consistency training, either degrade quality, incur costly retraining, or lack generalization. We propose FlowCast, a training-free speculative generation framework that accelerates inference by exploiting the fact that FM models are trained to preserve constant velocity. FlowCast speculates future velocity by extrapolating current velocity without incurring additional time cost, and accepts it if it is within a mean-squared error threshold. This constant-velocity forecasting allows redundant steps in stable regions to be aggressively skipped while retaining precision in complex ones. FlowCast is a plug-and-play framework that integrates seamlessly with any FM model and requires no auxiliary networks. We also present a theoretical analysis and bound the worst-case deviation between speculative and full FM trajectories. Empirical evaluations demonstrate that FlowCast achieves $>2.5\times$ speedup in image generation, video generation, and editing tasks, outperforming existing baselines with no quality loss as compared to standard full generation.

LGOct 26, 2025
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

Vision-language Models (VLMs) have made significant strides in visual understanding and query response generation, but often face challenges of high computational cost and inference latency due to autoregressive decoding. In this work, we introduce an imitation-learning-based Self-Speculative Decoding (SSD) framework, named FastVLM, to address these limitations. Our approach employs a lightweight draft model for token generation in an autoregressive manner, while a full model verifies these tokens non-autoregressively. Accepted tokens proceed seamlessly, while rejected tokens are corrected by the full model and used to guide the draft model's refinement. Through an imitation network, FastVLM enhances the draft model by integrating deeper level insights from the full model's architecture. Also, it maintains the performance integrity of the full model while training the draft model, achieving a balance between efficiency and accuracy. Our method speeds up the inference process by 1.55-1.85x as compared to the final layer with minimal loss in performance.

LGJan 19, 2024
I-SplitEE: Image classification in Split Computing DNNs with Early Exits

Divya Jyoti Bajpai, Aastha Jaiswal, Manjesh Kumar Hanawal

The recent advances in Deep Neural Networks (DNNs) stem from their exceptional performance across various domains. However, their inherent large size hinders deploying these networks on resource-constrained devices like edge, mobile, and IoT platforms. Strategies have emerged, from partial cloud computation offloading (split computing) to integrating early exits within DNN layers. Our work presents an innovative unified approach merging early exits and split computing. We determine the 'splitting layer', the optimal depth in the DNN for edge device computations, and whether to infer on edge device or be offloaded to the cloud for inference considering accuracy, computational efficiency, and communication costs. Also, Image classification faces diverse environmental distortions, influenced by factors like time of day, lighting, and weather. To adapt to these distortions, we introduce I-SplitEE, an online unsupervised algorithm ideal for scenarios lacking ground truths and with sequential data. Experimental validation using Caltech-256 and Cifar-10 datasets subjected to varied distortions showcases I-SplitEE's ability to reduce costs by a minimum of 55% with marginal performance degradation of at most 5%.

SPDec 30, 2023
HoloBeam: Learning Optimal Beamforming in Far-Field Holographic Metasurface Transceivers

Debamita Ghosh, Manjesh Kumar Hanawal, Nikola Zlatanova

Holographic Metasurface Transceivers (HMTs) are emerging as cost-effective substitutes to large antenna arrays for beamforming in Millimeter and TeraHertz wave communication. However, to achieve desired channel gains through beamforming in HMT, phase-shifts of a large number of elements need to be appropriately set, which is challenging. Also, these optimal phase-shifts depend on the location of the receivers, which could be unknown. In this work, we develop a learning algorithm using a {\it fixed-budget multi-armed bandit framework} to beamform and maximize received signal strength at the receiver for far-field regions. Our algorithm, named \Algo exploits the parametric form of channel gains of the beams, which can be expressed in terms of two {\it phase-shifting parameters}. Even after parameterization, the problem is still challenging as phase-shifting parameters take continuous values. To overcome this, {\it\HB} works with the discrete values of phase-shifting parameters and exploits their unimodal relations with channel gains to learn the optimal values faster. We upper bound the probability of {\it\HB} incorrectly identifying the (discrete) optimal phase-shift parameters in terms of the number of pilots used in learning. We show that this probability decays exponentially with the number of pilot signals. We demonstrate that {\it\HB} outperforms state-of-the-art algorithms through extensive simulations.

CRNov 27, 2021
Evading Malware Analysis Using Reverse Execution

Adhokshaj Mishra, Animesh Roy, Manjesh Kumar Hanawal

Malware is a security threat, and various means are adapted to detect and block them. In this paper, we demonstrate a method where malware can evade malware analysis. The method is based on single-step reverse execution of code using the self-debugging feature. We discuss how self-debugging code works and use that to derive reverse execution for any payload. Further, we demonstrate the feasibility of a detection evading malware through a real implementation that targets Linux x86-64 architecture for a reference implementation. The reference implementation produces one result when run in one direction and a different result when run in the reverse direction.

LGJun 15, 2015
Cheap Bandits

Manjesh Kumar Hanawal, Venkatesh Saligrama, Michal Valko et al.

We consider stochastic sequential learning problems where the learner can observe the \textit{average reward of several actions}. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the actions to observe represent some (geographical) area. The importance of this setting is that in these applications, it is actually \textit{cheaper} to observe average reward of a group of actions rather than the reward of a single action. We show that when the reward is \textit{smooth} over a given graph representing the neighboring actions, we can maximize the cumulative reward of learning while \textit{minimizing the sensing cost}. In this paper we propose CheapUCB, an algorithm that matches the regret guarantees of the known algorithms for this setting and at the same time guarantees a linear cost again over them. As a by-product of our analysis, we establish a $Ω(\sqrt{dT})$ lower bound on the cumulative regret of spectral bandits for a class of graphs with effective dimension $d$.