Satyam Kumar

LG
h-index18
14papers
47citations
Novelty46%
AI Score51

14 Papers

IRMay 29
An Industrial-Scale Sequential Recommender for LinkedIn Feed Ranking

Lars Hertel, Gaurav Srivastava, Syed Ali Naqvi et al.

LinkedIn Feed enables professionals worldwide to discover relevant content, build connections, and share knowledge at scale. We present Feed Sequential Recommender (Feed SR), a transformer-based sequential ranking model for LinkedIn Feed that replaces a DCNv2-based ranker and meets strict production constraints. We detail the modeling choices, training techniques, and serving optimizations that enable deployment at a scale of 1.2 billion members. Feed SR has been serving the majority of LinkedIn's Feed traffic for over three months and shows significant improvements in member engagement (+2.10% time spent, +3.52% like, comments, or reshares) in online A/B tests compared to the existing production model. We also describe our deployment experience with alternative sequential and LLM-based ranking architectures and why Feed SR provided the best combination of online metrics and production efficiency.

AIAug 18, 2022
Explainable Reinforcement Learning on Financial Stock Trading using SHAP

Satyam Kumar, Mendhikar Vishal, Vadlamani Ravi

Explainable Artificial Intelligence (XAI) research gained prominence in recent years in response to the demand for greater transparency and trust in AI from the user communities. This is especially critical because AI is adopted in sensitive fields such as finance, medicine etc., where implications for society, ethics, and safety are immense. Following thorough systematic evaluations, work in XAI has primarily focused on Machine Learning (ML) for categorization, decision, or action. To the best of our knowledge, no work is reported that offers an Explainable Reinforcement Learning (XRL) method for trading financial stocks. In this paper, we proposed to employ SHapley Additive exPlanation (SHAP) on a popular deep reinforcement learning architecture viz., deep Q network (DQN) to explain an action of an agent at a given instance in financial stock trading. To demonstrate the effectiveness of our method, we tested it on two popular datasets namely, SENSEX and DJIA, and reported the results.

AIJul 31, 2023
Causal Inference for Banking Finance and Insurance A Survey

Satyam Kumar, Yelleti Vivek, Vadlamani Ravi et al.

Causal Inference plays an significant role in explaining the decisions taken by statistical models and artificial intelligence models. Of late, this field started attracting the attention of researchers and practitioners alike. This paper presents a comprehensive survey of 37 papers published during 1992-2023 and concerning the application of causal inference to banking, finance, and insurance. The papers are categorized according to the following families of domains: (i) Banking, (ii) Finance and its subdomains such as corporate finance, governance finance including financial risk and financial policy, financial economics, and Behavioral finance, and (iii) Insurance. Further, the paper covers the primary ingredients of causal inference namely, statistical methods such as Bayesian Causal Network, Granger Causality and jargon used thereof such as counterfactuals. The review also recommends some important directions for future research. In conclusion, we observed that the application of causal inference in the banking and insurance sectors is still in its infancy, and thus more research is possible to turn it into a viable method.

LGAug 19, 2022
Application of Causal Inference to Analytical Customer Relationship Management in Banking and Insurance

Satyam Kumar, Vadlamani Ravi

Of late, in order to have better acceptability among various domain, researchers have argued that machine intelligence algorithms must be able to provide explanations that humans can understand causally. This aspect, also known as causability, achieves a specific level of human-level explainability. A specific class of algorithms known as counterfactuals may be able to provide causability. In statistics, causality has been studied and applied for many years, but not in great detail in artificial intelligence (AI). In a first-of-its-kind study, we employed the principles of causal inference to provide explainability for solving the analytical customer relationship management (ACRM) problems. In the context of banking and insurance, current research on interpretability tries to address causality-related questions like why did this model make such decisions, and was the model's choice influenced by a particular factor? We propose a solution in the form of an intervention, wherein the effect of changing the distribution of features of ACRM datasets is studied on the target feature. Subsequently, a set of counterfactuals is also obtained that may be furnished to any customer who demands an explanation of the decision taken by the bank/insurance company. Except for the credit card churn prediction dataset, good quality counterfactuals were generated for the loan default, insurance fraud detection, and credit card fraud detection datasets, where changes in no more than three features are observed.

LGJul 18, 2022
Explainable Deep Belief Network based Auto encoder using novel Extended Garson Algorithm

Satyam Kumar, Vadlamani Ravi

The most difficult task in machine learning is to interpret trained shallow neural networks. Deep neural networks (DNNs) provide impressive results on a larger number of tasks, but it is generally still unclear how decisions are made by such a trained deep neural network. Providing feature importance is the most important and popular interpretation technique used in shallow and deep neural networks. In this paper, we develop an algorithm extending the idea of Garson Algorithm to explain Deep Belief Network based Auto-encoder (DBNA). It is used to determine the contribution of each input feature in the DBN. It can be used for any kind of neural network with many hidden layers. The effectiveness of this method is tested on both classification and regression datasets taken from literature. Important features identified by this method are compared against those obtained by Wald chi square (\c{hi}2). For 2 out of 4 classification datasets and 2 out of 5 regression datasets, our proposed methodology resulted in the identification of better-quality features leading to statistically more significant results vis-à-vis Wald \c{hi}2.

ROMar 21
Characterizing the onset and offset of motor imagery during passive arm movements induced by an upper-body exoskeleton

Kanishka Mitra, Frigyes Samuel Racz, Satyam Kumar et al.

Two distinct technologies have gained attention lately due to their prospects for motor rehabilitation: robotics and brain-machine interfaces (BMIs). Harnessing their combined efforts is a largely uncharted and promising direction that has immense clinical potential. However, a significant challenge is whether motor intentions from the user can be accurately detected using non-invasive BMIs in the presence of instrumental noise and passive movements induced by the rehabilitation exoskeleton. As an alternative to the straightforward continuous control approach, this study instead aims to characterize the onset and offset of motor imagery during passive arm movements induced by an upper-body exoskeleton to allow for the natural control (initiation and termination) of functional movements. Ten participants were recruited to perform kinesthetic motor imagery (MI) of the right arm while attached to the robot, simultaneously cued with LEDs indicating the initiation and termination of a goal-oriented reaching task. Using electroencephalogram signals, we built a decoder to detect the transition between i) rest and beginning MI and ii) maintaining and ending MI. Offline decoder evaluation achieved group average onset accuracy of 60.7% and 66.6% for offset accuracy, revealing that the start and stop of MI could be identified while attached to the robot. Furthermore, pseudo-online evaluation could replicate this performance, forecasting reliable online exoskeleton control in the future. Our approach showed that participants could produce quality and reliable sensorimotor rhythms regardless of noise or passive arm movements induced by wearing the exoskeleton, which opens new possibilities for BMI control of assistive devices.

ROMar 17
Real-Time Decoding of Movement Onset and Offset for Brain-Controlled Rehabilitation Exoskeleton

Kanishka Mitra, Satyam Kumar, Frigyes Samuel Racz et al.

Robot-assisted therapy can deliver high-dose, task-specific training after neurologic injury, but most systems act primarily at the limb level-engaging the impaired neural circuits only indirectly-which remains a key barrier to truly contingent, neuroplasticity-targeted rehabilitation. We address this gap by implementing online, dual-state motor imagery control of an upper-limb exoskeleton, enabling goal-directed reaches to be both initiated and terminated directly from non-invasive EEG. Eight participants used EEG to initiate assistance and then volitionally halt the robot mid-trajectory. Across two online sessions, group-mean hit rates were 61.5% for onset and 64.5% for offset, demonstrating reliable start-stop command delivery despite instrumental noise and passive arm motion. Methodologically, we reveal a systematic, class-driven bias induced by common task-based recentering using an asymmetric margin diagnostic, and we introduce a class-agnostic fixation-based recentering method that tracks drift without sampling command classes while preserving class geometry. This substantially improves threshold-free separability (AUC gains: onset +56%, p = 0.0117; offset +34%, p = 0.0251) and reduces bias within and across days. Together, these results help bridge offline decoding and practical, intention-driven start-stop control of a rehabilitation exoskeleton, enabling precisely timed, contingent assistance aligned with neuroplasticity goals while supporting future clinical translation.

DCApr 5
QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration

Satyam Kumar, Saurabh Jha

Deploying large language models (LLMs) on heterogeneous edge devices demands frameworks that jointly optimize energy efficiency, inference quality, and reliability. Our prior QEIL v1 (Kumar & Jha, 2026) achieved 4.82x IPW improvement but relied on static efficiency factors, greedy optimization, and unverified candidate selection. QEIL v2 replaces every static heuristic with physics-grounded, runtime-adaptive models. We introduce three device-workload metrics: DASI (roofline-derived compute utilization), CPQ (memory pressure from allocation theory), and Phi (thermal yield from CMOS leakage physics), forming a unified energy equation with every coefficient traceable to semiconductor physics. For optimization, PGSAM (Pareto-Guided Simulated Annealing with Momentum) simultaneously minimizes energy, latency, and device underutilization. At inference time, the EAC/ARDE selection cascade with CSVET early stopping provides progressive verification among repeated samples. Evaluated on WikiText-103, GSM8K, and ARC-Challenge across seven model families (125M-8B parameters, including one pre-quantized variant), QEIL v2 achieves 75.7% pass@k at 63.8W (IPW=0.9749), a 2.86x improvement over standard inference. When applied to a 4-bit Llama-3.1-8B, QEIL v2's physics-grounded routing achieves IPW=1.024 at 54.8W -- the first edge orchestration system to surpass the IPW=1.0 empirical reference mark, with the gain attributable entirely to QEIL v2's workload-adaptive device allocation on a model with reduced memory bandwidth requirements. Total energy drops 75.6% vs. standard with 38.3% latency reduction, zero thermal throttling, and 100% fault recovery across all benchmarks and model families.

ARApr 14
Forge-UGC: FX optimization and register-graph engine for universal graph compiler

Satyam Kumar, Saurabh Jha

We present Forge-UGC (FX Optimization and Register-Graph Engine for Universal Graph Compilation), a four-phase compiler for transformer deployment on heterogeneous accelerator hardware, validated on Intel AI Boost NPU. Existing frameworks such as OpenVINO and ONNX Runtime often use opaque compilation pipelines, limited pass-level visibility, and weak buffer management, which can lead to higher compilation cost and runtime overhead. Forge-UGC addresses this with a hardware-agnostic design that separates graph capture, optimization, intermediate representation lowering, and backend scheduling. Phase 1 captures graphs with torch.export at the ATen operator level, supporting modern transformer components such as rotary position embeddings, grouped-query attention, and SwiGLU without manual decomposition. Phase 2 applies six optimization passes: dead code elimination, common subexpression elimination, constant folding, attention fusion, operator fusion, and layout optimization, reducing graph node count by 14.2 to 21.9%. Phase 3 lowers the optimized graph into a typed intermediate representation with explicit virtual register assignments. Phase 4 performs liveness analysis, linear-scan buffer allocation, reducing peak buffer count by 30 to 48%, and device-affinity scheduling, reducing NPU-CPU transitions by 42 to 65%. Across six model families ranging from 125M to 8B parameters, evaluated on WikiText-103 and GLUE, Forge-UGC delivers 6.9 to 9.2x faster compilation than OpenVINO and ONNX Runtime, 18.2 to 35.7% lower inference latency, and 30.2 to 40.9% lower energy per inference. Fidelity is preserved, with max absolute logit differences below 2.1e-5 and KL divergence below 8.4e-9. We also introduce Fusion Gain Ratio, Compilation Efficiency Index, and per-pass execution profiling for systematic evaluation of NPU compilation pipelines.

DCFeb 11
StreamServe: Adaptive Speculative Flows for Low-Latency Disaggregated LLM Serving

Satyam Kumar, Arpit Singh Gautam, Kailash Talreja et al.

Efficient LLM serving must balance throughput and latency across diverse, bursty workloads. We introduce StreamServe, a disaggregated prefill decode serving architecture that combines metric aware routing across compute lanes with adaptive speculative decoding that tunes speculation depth online from runtime signals. StreamServe comprises four components: StreamScheduler for request orchestration, FlowGuard for multi signal routing, PipeServe Engine for disaggregated prefill decode execution on multi GPU, and SpecuStream for runtime adaptive speculation. We evaluate StreamServe on four benchmarks ALPACA, GSM8K, HUMANEVAL, and SUM with 80 queries each and 320 total using 4 A800 40GB GPUs configured as two stream pairs. Across these workloads, StreamServe reduces latency by 11 to 18 times relative to tensor parallel vLLM baselines and reaches throughput up to 2235 tokens per second on summarization tasks. Time per output token remains stable across configurations, indicating that the gains arise from architectural efficiency rather than token quality degradation. Although evaluated on a single node 4 GPU setup, these results suggest that jointly adapting routing and speculation within a disaggregated framework creates a distinct operating regime for LLM inference.

LGJul 11, 2025
A Comprehensively Adaptive Architectural Optimization-Ingrained Quantum Neural Network Model for Cloud Workloads Prediction

Jitendra Kumar, Deepika Saxena, Kishu Gupta et al.

Accurate workload prediction and advanced resource reservation are indispensably crucial for managing dynamic cloud services. Traditional neural networks and deep learning models frequently encounter challenges with diverse, high-dimensional workloads, especially during sudden resource demand changes, leading to inefficiencies. This issue arises from their limited optimization during training, relying only on parametric (inter-connection weights) adjustments using conventional algorithms. To address this issue, this work proposes a novel Comprehensively Adaptive Architectural Optimization-based Variable Quantum Neural Network (CA-QNN), which combines the efficiency of quantum computing with complete structural and qubit vector parametric learning. The model converts workload data into qubits, processed through qubit neurons with Controlled NOT-gated activation functions for intuitive pattern recognition. In addition, a comprehensive architecture optimization algorithm for networks is introduced to facilitate the learning and propagation of the structure and parametric values in variable-sized QNNs. This algorithm incorporates quantum adaptive modulation and size-adaptive recombination during training process. The performance of CA-QNN model is thoroughly investigated against seven state-of-the-art methods across four benchmark datasets of heterogeneous cloud workloads. The proposed model demonstrates superior prediction accuracy, reducing prediction errors by up to 93.40% and 91.27% compared to existing deep learning and QNN-based approaches.

SESep 24, 2025
Intuition to Evidence: Measuring AI's True Impact on Developer Productivity

Anand Kumar, Vishal Khare, Deepak Sharma et al.

We present a comprehensive real-world evaluation of AI-assisted software development tools deployed at enterprise scale. Over one year, 300 engineers across multiple teams integrated an in-house AI platform (DeputyDev) that combines code generation and automated review capabilities into their daily workflows. Through rigorous cohort analysis, our study demonstrates statistically significant productivity improvements, including an overall 31.8% reduction in PR review cycle time. Developer adoption was strong, with 85% satisfaction for code review features and 93% expressing a desire to continue using the platform. Adoption patterns showed systematic scaling from 4% engagement in month 1 to 83% peak usage by month 6, stabilizing at 60% active engagement. Top adopters achieved a 61% increase in code volume pushed to production, contributing to approximately 30 to 40% of code shipped to production through this tool, accounting for an overall 28% increase in code shipment volume. Unlike controlled benchmark evaluations, our longitudinal analysis provides empirical evidence from production environments, revealing both the transformative potential and practical deployment challenges of integrating AI into enterprise software development workflows.

ASJun 12, 2024
Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

Satyam Kumar, Sai Srujana Buddi, Utkarsh Oggy Sarawgi et al.

Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speech enhancement, and hands-free communication systems. With the increasing demand for personalized and context-aware technologies, the need for effective personalized VAD systems has become paramount. In this paper, we present a comparative analysis of Personalized Voice Activity Detection (PVAD) systems to assess their real-world effectiveness. We introduce a comprehensive approach to assess PVAD systems, incorporating various performance metrics such as frame-level and utterance-level error rates, detection latency and accuracy, alongside user-level analysis. Through extensive experimentation and evaluation, we provide a thorough understanding of the strengths and limitations of various PVAD variants. This paper advances the understanding of PVAD technology by offering insights into its efficacy and viability in practical applications using a comprehensive set of metrics.

QMJan 12, 2019
Divergence Framework for EEG based Multiclass Motor Imagery Brain Computer Interface

Satyam Kumar, Tharun Kumar Reddy, Laxmidhar Behera

Similar to most of the real world data, the ubiquitous presence of non-stationarities in the EEG signals significantly perturb the feature distribution thus deteriorating the performance of Brain Computer Interface. In this letter, a novel method is proposed based on Joint Approximate Diagonalization (JAD) to optimize stationarity for multiclass motor imagery Brain Computer Interface (BCI) in an information theoretic framework. Specifically, in the proposed method, we estimate the subspace which optimizes the discriminability between the classes and simultaneously preserve stationarity within the motor imagery classes. We determine the subspace for the proposed approach through optimization using gradient descent on an orthogonal manifold. The performance of the proposed stationarity enforcing algorithm is compared to that of baseline One-Versus-Rest (OVR)-CSP and JAD on publicly available BCI competition IV dataset IIa. Results show that an improvement in average classification accuracies across the subjects over the baseline algorithms and thus essence of alleviating within session non-stationarities.