Abhishek Sharma

CV
h-index56
65papers
4,383citations
Novelty50%
AI Score59

65 Papers

CLAug 17, 2023
Reinforced Self-Training (ReST) for Language Modeling

Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan et al. · deepmind

Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating samples from the policy, which are then used to improve the LLM policy using offline RL algorithms. ReST is more efficient than typical online RLHF methods because the training dataset is produced offline, which allows data reuse. While ReST is a general approach applicable to all generative learning settings, we focus on its application to machine translation. Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner.

CRSep 10, 2022
Preserving Privacy in Federated Learning with Ensemble Cross-Domain Knowledge Distillation

Xuan Gong, Abhishek Sharma, Srikrishna Karanam et al.

Federated Learning (FL) is a machine learning paradigm where local nodes collaboratively train a central model while the training data remains decentralized. Existing FL methods typically share model parameters or employ co-distillation to address the issue of unbalanced data distribution. However, they suffer from communication bottlenecks. More importantly, they risk privacy leakage. In this work, we develop a privacy preserving and communication efficient method in a FL framework with one-shot offline knowledge distillation using unlabeled, cross-domain public data. We propose a quantized and noisy ensemble of local predictions from completely trained local models for stronger privacy guarantees without sacrificing accuracy. Based on extensive experiments on image classification and text classification tasks, we show that our privacy-preserving method outperforms baseline FL algorithms with superior performance in both accuracy and communication efficiency.

LGOct 16, 2022
Federated Learning with Privacy-Preserving Ensemble Attention Distillation

Xuan Gong, Liangchen Song, Rishi Vedula et al.

Federated Learning (FL) is a machine learning paradigm where many local nodes collaboratively train a central model while keeping the training data decentralized. This is particularly relevant for clinical applications since patient data are usually not allowed to be transferred out of medical facilities, leading to the need for FL. Existing FL methods typically share model parameters or employ co-distillation to address the issue of unbalanced data distribution. However, they also require numerous rounds of synchronized communication and, more importantly, suffer from a privacy leakage risk. We propose a privacy-preserving FL framework leveraging unlabeled public data for one-way offline knowledge distillation in this work. The central model is learned from local knowledge via ensemble attention distillation. Our technique uses decentralized and heterogeneous local data like existing FL approaches, but more importantly, it significantly reduces the risk of privacy leakage. We demonstrate that our method achieves very competitive performance with more robust privacy preservation based on extensive experiments on image classification, segmentation, and reconstruction tasks.

59.8AIJun 1
BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning

Shannon Serrao, Soumitra Chatterjee, Dorina Strori et al.

Enterprise AI systems that translate natural language into SQL queries and orchestrate multi-step agentic reasoning pipelines require evaluation approaches fundamentally different from academic benchmarks. Spider and BIRD established execution-accuracy protocols; G-Eval and RAGAS advanced LLM-based assessment; and recent work such as Spider 2.0, BEAVER, and BIRD-Interact has begun to address enterprise and agentic dimensions. No single framework unifies text-to-SQL assessment with agentic behavior evaluation into a production-grade pipeline calibrated against human expert judgment. We present BADGER, developed at Merkle, a unified evaluation framework integrating text-to-SQL assessment with agentic behavior evaluation. BADGER offers three contributions. First, LLM-assisted SQL component extraction extending Spider methodology to handle CTE-heavy, dialect-specific SQL. Second, a hybrid execution accuracy metric (Hybrid-EX) resolving column-aliasing and numeric-tolerance brittleness by using an LLM to infer structural alignments before deterministic cell-level scoring. Validated on 150 human-annotated industry queries, Hybrid-EX achieves Cohen's kappa=0.717 [95% CI: 0.600-0.822] (Substantial agreement) and 87.3% balanced accuracy, outperforming all six competing frameworks (Delta-kappa: 0.322-0.502, all p<=0.001). Third, an enterprise agentic evaluation suite assembling RAGAS, G-Eval, and agent benchmark metrics into a unified pipeline; Excess Tool Usage is the sole novel element. BADGER runs entirely within the client's governed data environment, supports configurable LLM judge backends, and enables rapid prototyping of client-specific judges and metrics, serving as a continuous evaluation backbone rather than a one-time quality gate.

IRApr 7, 2022
A Joint Learning Approach for Semi-supervised Neural Topic Modeling

Jeffrey Chiu, Rajat Mittal, Neehal Tumma et al.

Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.

AIFeb 1, 2023
iPAL: A Machine Learning Based Smart Healthcare Framework For Automatic Diagnosis Of Attention Deficit/Hyperactivity Disorder (ADHD)

Abhishek Sharma, Arpit Jain, Shubhangi Sharma et al.

ADHD is a prevalent disorder among the younger population. Standard evaluation techniques currently use evaluation forms, interviews with the patient, and more. However, its symptoms are similar to those of many other disorders like depression, conduct disorder, and oppositional defiant disorder, and these current diagnosis techniques are not very effective. Thus, a sophisticated computing model holds the potential to provide a promising diagnosis solution to this problem. This work attempts to explore methods to diagnose ADHD using combinations of multiple established machine learning techniques like neural networks and SVM models on the ADHD200 dataset and explore the field of neuroscience. In this work, multiclass classification is performed on phenotypic data using an SVM model. The better results have been analyzed on the phenotypic data compared to other supervised learning techniques like Logistic regression, KNN, AdaBoost, etc. In addition, neural networks have been implemented on functional connectivity from the MRI data of a sample of 40 subjects provided to achieve high accuracy without prior knowledge of neuroscience. It is combined with the phenotypic classifier using the ensemble technique to get a binary classifier. It is further trained and tested on 400 out of 824 subjects from the ADHD200 data set and achieved an accuracy of 92.5% for binary classification The training and testing accuracy has been achieved upto 99% using ensemble classifier.

LGApr 6, 2023
Decision-Focused Model-based Reinforcement Learning for Reward Transfer

Abhishek Sharma, Sonali Parbhoo, Omer Gottesman et al.

Model-based reinforcement learning (MBRL) provides a way to learn a transition model of the environment, which can then be used to plan personalized policies for different patient cohorts and to understand the dynamics involved in the decision-making process. However, standard MBRL algorithms are either sensitive to changes in the reward function or achieve suboptimal performance on the task when the transition model is restricted. Motivated by the need to use simple and interpretable models in critical domains such as healthcare, we propose a novel robust decision-focused (RDF) algorithm that learns a transition model that achieves high returns while being robust to changes in the reward function. We demonstrate our RDF algorithm can be used with several model classes and planning algorithms. We also provide theoretical and empirical evidence, on a variety of simulators and real patient data, that RDF can learn simple yet effective models that can be used to plan personalized policies.

CVJul 12, 2024
Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images

Tianyu Luan, Zhongpai Gao, Luyuan Xie et al.

We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruction. To overcome this limitation, our method employs a "Divide and Fuse (D&F)" strategy, reconstructing human body parts independently before fusing them, thereby ensuring robustness against occlusions. We design Human Part Parametric Models (HPPM) that independently reconstruct the mesh from a few shape and global-location parameters, without inter-part dependency. A specially designed fusion module then seamlessly integrates the reconstructed parts, even when only a few are visible. We harness a large volume of ground-truth SMPL data to train our parametric mesh models. To facilitate the training and evaluation of our method, we have established benchmark datasets featuring images of partially visible humans with HPPM annotations. Our experiments, conducted on these benchmark datasets, demonstrate the effectiveness of our D&F method, particularly in scenarios with substantial invisibility, where traditional approaches struggle to maintain reconstruction quality.

CVJul 20, 2024
Automated Patient Positioning with Learned 3D Hand Gestures

Zhongpai Gao, Abhishek Sharma, Meng Zheng et al.

Positioning patients for scanning and interventional procedures is a critical task that requires high precision and accuracy. The conventional workflow involves manually adjusting the patient support to align the center of the target body part with the laser projector or other guiding devices. This process is not only time-consuming but also prone to inaccuracies. In this work, we propose an automated patient positioning system that utilizes a camera to detect specific hand gestures from technicians, allowing users to indicate the target patient region to the system and initiate automated positioning. Our approach relies on a novel multi-stage pipeline to recognize and interpret the technicians' gestures, translating them into precise motions of medical devices. We evaluate our proposed pipeline during actual MRI scanning procedures, using RGB-Depth cameras to capture the process. Results show that our system achieves accurate and precise patient positioning with minimal technician intervention. Furthermore, we validate our method on HaGRID, a large-scale hand gesture dataset, demonstrating its effectiveness in hand detection and gesture recognition.

CVSep 6, 2022
Surya Namaskar: real-time advanced yoga pose recognition and correction for smart healthcare

Abhishek Sharma, Pranjal Sharma, Darshan Pincha et al.

Nowadays, yoga has gained worldwide attention because of increasing levels of stress in the modern way of life, and there are many ways or resources to learn yoga. The word yoga means a deep connection between the mind and body. Today there is substantial Medical and scientific evidence to show that the very fundamentals of the activity of our brain, our chemistry even our genetic content can be changed by practicing different systems of yoga. Suryanamaskar, also known as salute to the sun, is a yoga practice that combines eight different forms and 12 asanas(4 asana get repeated) devoted to the Hindu Sun God, Surya. Suryanamaskar offers a number of health benefits such as strengthening muscles and helping to control blood sugar levels. Here the Mediapipe Library is used to analyze Surya namaskar situations. Standing is detected in real time with advanced software, as one performs Surya namaskar in front of the camera. The class divider identifies the form as one of the following: Pranamasana, Hasta Padasana, Hasta Uttanasana, Ashwa - Sanchalan asana, Ashtanga Namaskar, Dandasana, or Bhujangasana and Svanasana. Deep learning-based techniques(CNN) are used to develop this model with model accuracy of 98.68 percent and an accuracy score of 0.75 to detect correct yoga (Surya Namaskar ) posture. With this method, the users can practice the desired pose and can check if the pose that the person is doing is correct or not. It will help in doing all the different poses of surya namaskar correctly and increase the efficiency of the yoga practitioner. This paper describes the whole framework which is to be implemented in the model.

49.9LGMay 12
Quantifying Potential Observation Missingness in Inverse Reinforcement Learning

Leo Benac, Abhishek Sharma, Alihan Huyuk et al.

Inverse reinforcement learning (IRL), which infers reward functions from demonstrations, is a valuable tool for modeling and understanding decision-making behavior. Many variants of IRL have been developed to capture complexities of human decision-making, such as subjective beliefs, imperfect planning, and dynamic goals. However, an often-overlooked issue in real-world behavioral datasets is that the recorded data may be missing observations that were available to the original decision-maker. In use-inspired settings such as healthcare, this can make expert actions appear suboptimal, even when they were near-optimal given the information available at the time. As a result, the rewards learned by standard IRL may be misleading. In this paper, we identify the minimal perturbations to the recorded observations needed for the expert's actions to appear optimal. We develop a practical algorithm for this problem and demonstrate its utility for quantifying the possible extent of missing observations in behavioral datasets through extensive experiments on synthetic navigation tasks, a cancer treatment simulator, and ICU treatment data.

LGSep 29, 2020Code
Geometric Matrix Completion: A Functional View

Abhishek Sharma, Maks Ovsjanikov

We propose a totally functional view of geometric matrix completion problem. Differently from existing work, we propose a novel regularization inspired from the functional map literature that is more interpretable and theoretically sound. On synthetic tasks with strong underlying geometric structure, our framework outperforms state of the art by a huge margin (two order of magnitude) demonstrating the potential of our approach. On real datasets, we achieve state-of-the-art results at a fraction of the computational effort of previous methods. Our code is publicly available at https://github.com/Not-IITian/functional-matrix-completion

CVSep 28, 2020Code
Weakly Supervised Deep Functional Map for Shape Matching

Abhishek Sharma, Maks Ovsjanikov

A variety of deep functional maps have been proposed recently, from fully supervised to totally unsupervised, with a range of loss functions as well as different regularization terms. However, it is still not clear what are minimum ingredients of a deep functional map pipeline and whether such ingredients unify or generalize all recent work on deep functional maps. We show empirically minimum components for obtaining state of the art results with different loss functions, supervised as well as unsupervised. Furthermore, we propose a novel framework designed for both full-to-full as well as partial to full shape matching that achieves state of the art results on several benchmark datasets outperforming even the fully supervised methods by a significant margin. Our code is publicly available at https://github.com/Not-IITian/Weakly-supervised-Functional-map

CLApr 10, 2020Code
Towards Automatic Generation of Questions from Long Answers

Shlok Kumar Mishra, Pranav Goel, Abhishek Sharma et al.

Automatic question generation (AQG) has broad applicability in domains such as tutoring systems, conversational agents, healthcare literacy, and information retrieval. Existing efforts at AQG have been limited to short answer lengths of up to two or three sentences. However, several real-world applications require question generation from answers that span several sentences. Therefore, we propose a novel evaluation benchmark to assess the performance of existing AQG systems for long-text answers. We leverage the large-scale open-source Google Natural Questions dataset to create the aforementioned long-answer AQG benchmark. We empirically demonstrate that the performance of existing AQG methods significantly degrades as the length of the answer increases. Transformer-based methods outperform other existing AQG methods on long answers in terms of automatic as well as human evaluation. However, we still observe degradation in the performance of our best performing models with increasing sentence length, suggesting that long answer QA is a challenging benchmark task for future research.

MLMar 31, 2020Code
Deep Geometric Functional Maps: Robust Feature Learning for Shape Correspondence

Nicolas Donati, Abhishek Sharma, Maks Ovsjanikov

We present a novel learning-based approach for computing correspondences between non-rigid 3D shapes. Unlike previous methods that either require extensive training data or operate on handcrafted input descriptors and thus generalize poorly across diverse datasets, our approach is both accurate and robust to changes in shape structure. Key to our method is a feature-extraction network that learns directly from raw shape geometry, combined with a novel regularized map extraction layer and loss, based on the functional map representation. We demonstrate through extensive experiments in challenging shape matching scenarios that our method can learn from less training data than existing supervised approaches and generalizes significantly better than current descriptor-based learning methods. Our source code is available at: https://github.com/LIX-shape-analysis/GeomFmaps.

CVApr 2, 2019Code
Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking

Saurabh Sharma, Pavan Teja Varigonda, Prashast Bindal et al.

Monocular 3D human-pose estimation from static images is a challenging problem, due to the curse of dimensionality and the ill-posed nature of lifting 2D-to-3D. In this paper, we propose a Deep Conditional Variational Autoencoder based model that synthesizes diverse anatomically plausible 3D-pose samples conditioned on the estimated 2D-pose. We show that CVAE-based 3D-pose sample set is consistent with the 2D-pose and helps tackling the inherent ambiguity in 2D-to-3D lifting. We propose two strategies for obtaining the final 3D pose- (a) depth-ordering/ordinal relations to score and weight-average the candidate 3D-poses, referred to as OrdinalScore, and (b) with supervision from an Oracle. We report close to state of-the-art results on two benchmark datasets using OrdinalScore, and state-of-the-art results using the Oracle. We also show that our pipeline yields competitive results without paired image-to-3D annotations. The training and evaluation code is available at https://github.com/ssfootball04/generative_pose.

IVFeb 10, 2024
An Optimization Framework for Processing and Transfer Learning for the Brain Tumor Segmentation

Tianyi Ren, Ethan Honey, Harshitha Rebala et al.

Tumor segmentation from multi-modal brain MRI images is a challenging task due to the limited samples, high variance in shapes and uneven distribution of tumor morphology. The performance of automated medical image segmentation has been significant improvement by the recent advances in deep learning. However, the model predictions have not yet reached the desired level for clinical use in terms of accuracy and generalizability. In order to address the distinct problems presented in Challenges 1, 2, and 3 of BraTS 2023, we have constructed an optimization framework based on a 3D U-Net model for brain tumor segmentation. This framework incorporates a range of techniques, including various pre-processing and post-processing techniques, and transfer learning. On the validation datasets, this multi-modality brain tumor segmentation framework achieves an average lesion-wise Dice score of 0.79, 0.72, 0.74 on Challenges 1, 2, 3 respectively.

LGJul 17, 2025
Apple Intelligence Foundation Language Models: Tech Report 2025

Ethan Li, Anders Boesen Lindbo Larsen, Chen Zhang et al. · apple-ml, cmu

We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transformer that combines track parallelism, mixture-of-experts sparse computation, and interleaved global-local attention to deliver high quality with competitive cost on Apple's Private Cloud Compute platform. Both models are trained on large-scale multilingual and multimodal datasets sourced via responsible web crawling, licensed corpora, and high-quality synthetic data, then further refined with supervised fine-tuning and reinforcement learning on a new asynchronous platform. The resulting models support several additional languages while understanding images and executing tool calls. In public benchmarks and human evaluations, both the server model and the on-device model match or surpass comparably sized open baselines. A new Swift-centric Foundation Models framework exposes guided generation, constrained tool calling, and LoRA adapter fine-tuning, allowing developers to integrate these capabilities with a few lines of code. The latest advancements in Apple Intelligence models are grounded in our Responsible AI approach with safeguards like content filtering and locale-specific evaluation, as well as our commitment to protecting our users' privacy with innovations like Private Cloud Compute.

IVNov 27, 2024
Neural Finite-State Machines for Surgical Phase Recognition

Hao Ding, Zhongpai Gao, Benjamin Planche et al.

Surgical phase recognition (SPR) is crucial for applications in workflow optimization, performance evaluation, and real-time intervention guidance. However, current deep learning models often struggle with fragmented predictions, failing to capture the sequential nature of surgical workflows. We propose the Neural Finite-State Machine (NFSM), a novel approach that enforces temporal coherence by integrating classical state-transition priors with modern neural networks. NFSM leverages learnable global state embeddings as unique phase identifiers and dynamic transition tables to model phase-to-phase progressions. Additionally, a future phase forecasting mechanism employs repeated frame padding to anticipate upcoming transitions. Implemented as a plug-and-play module, NFSM can be integrated into existing SPR pipelines without changing their core architectures. We demonstrate state-of-the-art performance across multiple benchmarks, including a significant improvement on the BernBypass70 dataset - raising video-level accuracy by 0.9 points and phase-level precision, recall, F1-score, and mAP by 3.8, 3.1, 3.3, and 4.1, respectively. Ablation studies confirm each component's effectiveness and the module's adaptability to various architectures. By unifying finite-state principles with deep learning, NFSM offers a robust path toward consistent, long-term surgical video analysis.

IVNov 14, 2024
Deep Learning for Fetal Inflammatory Response Diagnosis in the Umbilical Cord

Marina A. Ayad, Ramin Nateghi, Abhishek Sharma et al.

Inflammation of the umbilical cord can be seen as a result of ascending intrauterine infection or other inflammatory stimuli. Acute fetal inflammatory response (FIR) is characterized by infiltration of the umbilical cord by fetal neutrophils, and can be associated with neonatal sepsis or fetal inflammatory response syndrome. Recent advances in deep learning in digital pathology have demonstrated favorable performance across a wide range of clinical tasks, such as diagnosis and prognosis. In this study we classified FIR from whole slide images (WSI). We digitized 4100 histological slides of umbilical cord stained with hematoxylin and eosin(H&E) and extracted placental diagnoses from the electronic health record. We build models using attention-based whole slide learning models. We compared strategies between features extracted by a model (ConvNeXtXLarge) pretrained on non-medical images (ImageNet), and one pretrained using histopathology images (UNI). We trained multiple iterations of each model and combined them into an ensemble. The predictions from the ensemble of models trained using UNI achieved an overall balanced accuracy of 0.836 on the test dataset. In comparison, the ensembled predictions using ConvNeXtXLarge had a lower balanced accuracy of 0.7209. Heatmaps generated from top accuracy model appropriately highlighted arteritis in cases of FIR 2. In FIR 1, the highest performing model assigned high attention to areas of activated-appearing stroma in Wharton's Jelly. However, other high-performing models assigned attention to umbilical vessels. We developed models for diagnosis of FIR from placental histology images, helping reduce interobserver variability among pathologists. Future work may examine the utility of these models for identifying infants at risk of systemic inflammatory response or early onset neonatal sepsis.

CVNov 4, 2024
Machine learning identification of maternal inflammatory response and histologic choroamnionitis from placental membrane whole slide images

Abhishek Sharma, Ramin Nateghi, Marina Ayad et al.

The placenta forms a critical barrier to infection through pregnancy, labor and, delivery. Inflammatory processes in the placenta have short-term, and long-term consequences for offspring health. Digital pathology and machine learning can play an important role in understanding placental inflammation, and there have been very few investigations into methods for predicting and understanding Maternal Inflammatory Response (MIR). This work intends to investigate the potential of using machine learning to understand MIR based on whole slide images (WSI), and establish early benchmarks. To that end, we use Multiple Instance Learning framework with 3 feature extractors: ImageNet-based EfficientNet-v2s, and 2 histopathology foundation models, UNI and Phikon to investigate predictability of MIR stage from histopathology WSIs. We also interpret predictions from these models using the learned attention maps from these models. We also use the MIL framework for predicting white blood cells count (WBC) and maximum fever temperature ($T_{max}$). Attention-based MIL models are able to classify MIR with a balanced accuracy of up to 88.5% with a Cohen's Kappa ($κ$) of up to 0.772. Furthermore, we found that the pathology foundation models (UNI and Phikon) are both able to achieve higher performance with balanced accuracy and $κ$, compared to ImageNet-based feature extractor (EfficientNet-v2s). For WBC and $T_{max}$ prediction, we found mild correlation between actual values and those predicted from histopathology WSIs. We used MIL framework for predicting MIR stage from WSIs, and compared effectiveness of foundation models as feature extractors, with that of an ImageNet-based model. We further investigated model failure cases and found them to be either edge cases prone to interobserver variability, examples of pathologist's overreach, or mislabeled due to processing errors.

LGOct 12, 2024
Decision-Point Guided Safe Policy Improvement

Abhishek Sharma, Leo Benac, Sonali Parbhoo et al.

Within batch reinforcement learning, safe policy improvement (SPI) seeks to ensure that the learnt policy performs at least as well as the behavior policy that generated the dataset. The core challenge in SPI is seeking improvements while balancing risk when many state-action pairs may be infrequently visited. In this work, we introduce Decision Points RL (DPRL), an algorithm that restricts the set of state-action pairs (or regions for continuous states) considered for improvement. DPRL ensures high-confidence improvement in densely visited states (i.e. decision points) while still utilizing data from sparsely visited states. By appropriately limiting where and how we may deviate from the behavior policy, we achieve tighter bounds than prior work; specifically, our data-dependent bounds do not scale with the size of the state and action spaces. In addition to the analysis, we demonstrate that DPRL is both safe and performant on synthetic and real datasets.

CVFeb 12, 2024
PBADet: A One-Stage Anchor-Free Approach for Part-Body Association

Zhongpai Gao, Huayi Zhou, Abhishek Sharma et al.

The detection of human parts (e.g., hands, face) and their correct association with individuals is an essential task, e.g., for ubiquitous human-machine interfaces and action recognition. Traditional methods often employ multi-stage processes, rely on cumbersome anchor-based systems, or do not scale well to larger part sets. This paper presents PBADet, a novel one-stage, anchor-free approach for part-body association detection. Building upon the anchor-free object representation across multi-scale feature maps, we introduce a singular part-to-body center offset that effectively encapsulates the relationship between parts and their parent bodies. Our design is inherently versatile and capable of managing multiple parts-to-body associations without compromising on detection accuracy or robustness. Comprehensive experiments on various datasets underscore the efficacy of our approach, which not only outperforms existing state-of-the-art techniques but also offers a more streamlined and efficient solution to the part-body association challenge.

APP-PHJan 4
Image Synthesis Using Spintronic Deep Convolutional Generative Adversarial Network

Saumya Gupta, Abhinandan, Venkatesh vadde et al.

The computational requirements of generative adversarial networks (GANs) exceed the limit of conventional Von Neumann architectures, necessitating energy efficient alternatives such as neuromorphic spintronics. This work presents a hybrid CMOS-spintronic deep convolutional generative adversarial network (DCGAN) architecture for synthetic image generation. The proposed generative vision model approach follows the standard framework, leveraging generator and discriminators adversarial training with our designed spintronics hardware for deconvolution, convolution, and activation layers of the DCGAN architecture. To enable hardware aware spintronic implementation, the generator's deconvolution layers are restructured as zero padded convolution, allowing seamless integration with a 6-bit skyrmion based synapse in a crossbar, without compromising training performance. Nonlinear activation functions are implemented using a hybrid CMOS domain wall based Rectified linear unit (ReLU) and Leaky ReLU units. Our proposed tunable Leaky ReLU employs domain wall position coded, continuous resistance states and a piecewise uniaxial parabolic anisotropy profile with a parallel MTJ readout, exhibiting energy consumption of 0.192 pJ. Our spintronic DCGAN model demonstrates adaptability across both grayscale and colored datasets, achieving Fr'echet Inception Distances (FID) of 27.5 for the Fashion MNIST and 45.4 for Anime Face datasets, with testing energy (training energy) of 4.9 nJ (14.97~nJ/image) and 24.72 nJ (74.7 nJ/image).

LGSep 24, 2025
Dynamic Lagging for Time-Series Forecasting in E-Commerce Finance: Mitigating Information Loss with A Hybrid ML Architecture

Abhishek Sharma, Anat Parush, Sumit Wadhwa et al.

Accurate forecasting in the e-commerce finance domain is particularly challenging due to irregular invoice schedules, payment deferrals, and user-specific behavioral variability. These factors, combined with sparse datasets and short historical windows, limit the effectiveness of conventional time-series methods. While deep learning and Transformer-based models have shown promise in other domains, their performance deteriorates under partial observability and limited historical data. To address these challenges, we propose a hybrid forecasting framework that integrates dynamic lagged feature engineering and adaptive rolling-window representations with classical statistical models and ensemble learners. Our approach explicitly incorporates invoice-level behavioral modeling, structured lag of support data, and custom stability-aware loss functions, enabling robust forecasts in sparse and irregular financial settings. Empirical results demonstrate an approximate 5% reduction in MAPE compared to baseline models, translating into substantial financial savings. Furthermore, the framework enhances forecast stability over quarterly horizons and strengthens feature target correlation by capturing both short- and long-term patterns, leveraging user profile attributes, and simulating upcoming invoice behaviors. These findings underscore the value of combining structured lagging, invoice-level closure modeling, and behavioral insights to advance predictive accuracy in sparse financial time-series forecasting.

LGJul 25, 2025
Exploring molecular assembly as a biosignature using mass spectrometry and machine learning

Lindsay A. Rutter, Abhishek Sharma, Ian Seet et al.

Molecular assembly offers a promising path to detect life beyond Earth, while minimizing assumptions based on terrestrial life. As mass spectrometers will be central to upcoming Solar System missions, predicting molecular assembly from their data without needing to elucidate unknown structures will be essential for unbiased life detection. An ideal agnostic biosignature must be interpretable and experimentally measurable. Here, we show that molecular assembly, a recently developed approach to measure objects that have been produced by evolution, satisfies both criteria. First, it is interpretable for life detection, as it reflects the assembly of molecules with their bonds as building blocks, in contrast to approaches that discount construction history. Second, it can be determined without structural elucidation, as it can be physically measured by mass spectrometry, a property that distinguishes it from other approaches that use structure-based information measures for molecular complexity. Whilst molecular assembly is directly measurable using mass spectrometry data, there are limits imposed by mission constraints. To address this, we developed a machine learning model that predicts molecular assembly with high accuracy, reducing error by three-fold compared to baseline models. Simulated data shows that even small instrumental inconsistencies can double model error, emphasizing the need for standardization. These results suggest that standardized mass spectrometry databases could enable accurate molecular assembly prediction, without structural elucidation, providing a proof-of-concept for future astrobiology missions.

CLApr 3, 2025
CoLa: Learning to Interactively Collaborate with Large Language Models

Abhishek Sharma, Dan Goldwasser

LLMs' remarkable ability to tackle a wide range of language tasks opened new opportunities for collaborative human-AI problem solving. LLMs can amplify human capabilities by applying their intuitions and reasoning strategies at scale. We explore whether human guides can be simulated, by generalizing from human demonstrations of guiding an AI system to solve complex language problems. We introduce CoLa, a novel self-guided learning paradigm for training automated $\textit{guides}$ and evaluate it on two QA datasets, a puzzle-solving task, and a constrained text generation task. Our empirical results show that CoLa consistently outperforms competitive approaches across all domains. Moreover, a small-sized trained guide outperforms a strong model like GPT-4 when acting as a guide. We compare the strategies employed by humans and automated guides by conducting a human study on a QA dataset. We show that automated guides outperform humans by adapting their strategies to reasoners' capabilities and conduct qualitative analyses highlighting distinct differences in guiding strategies.

AIJan 15, 2025
Temporal Reasoning in AI systems

Abhishek Sharma

Commonsense temporal reasoning at scale is a core problem for cognitive systems. The correct inference of the duration for which fluents hold is required by many tasks, including natural language understanding and planning. Many AI systems have limited deductive closure because they cannot extrapolate information correctly regarding existing fluents and events. In this study, we discuss the knowledge representation and reasoning schemes required for robust temporal projection in the Cyc Knowledge Base. We discuss how events can start and end risk periods for fluents. We then use discrete survival functions, which represent knowledge of the persistence of facts, to extrapolate a given fluent. The extrapolated intervals can be truncated by temporal constraints and other types of commonsense knowledge. Finally, we present the results of experiments to demonstrate that these methods obtain significant improvements in terms of Q/A performance.

AIJan 15, 2025
A Coordination-based Approach for Focused Learning in Knowledge-Based Systems

Abhishek Sharma

Recent progress in Learning by Reading and Machine Reading systems has significantly increased the capacity of knowledge-based systems to learn new facts. In this work, we discuss the problem of selecting a set of learning requests for these knowledge-based systems which would lead to maximum Q/A performance. To understand the dynamics of this problem, we simulate the properties of a learning strategy, which sends learning requests to an external knowledge source. We show that choosing an optimal set of facts for these learning systems is similar to a coordination game, and use reinforcement learning to solve this problem. Experiments show that such an approach can significantly improve Q/A performance.

AIJan 15, 2025
Growth Patterns of Inference

Abhishek Sharma

What properties of a first-order search space support/hinder inference? What kinds of facts would be most effective to learn? Answering these questions is essential for understanding the dynamics of deductive reasoning and creating large-scale knowledge-based learning systems that support efficient inference. We address these questions by developing a model of how the distribution of ground facts affects inference performance in search spaces. Experiments suggest that uniform search spaces are suitable for larger KBs whereas search spaces with skewed degree distribution show better performance in smaller KBs. A sharp transition in Q/A performance is seen in some cases, suggesting that analysis of the structure of search spaces with existing knowledge should be used to guide the acquisition of new ground facts in learning systems.

LGDec 30, 2024
Advancing Parkinson's Disease Progression Prediction: Comparing Long Short-Term Memory Networks and Kolmogorov-Arnold Networks

Abhinav Roy, Bhavesh Gyanchandani, Aditya Oza et al.

Parkinson's Disease (PD) is a degenerative neurological disorder that impairs motor and non-motor functions, significantly reducing quality of life and increasing mortality risk. Early and accurate detection of PD progression is vital for effective management and improved patient outcomes. Current diagnostic methods, however, are often costly, time-consuming, and require specialized equipment and expertise. This work proposes an innovative approach to predicting PD progression using regression methods, Long Short-Term Memory (LSTM) networks, and Kolmogorov Arnold Networks (KAN). KAN, utilizing spline-parametrized univariate functions, allows for dynamic learning of activation patterns, unlike traditional linear models. The Movement Disorder Society-Sponsored Revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS) is a comprehensive tool for evaluating PD symptoms and is commonly used to measure disease progression. Additionally, protein or peptide abnormalities are linked to PD onset and progression. Identifying these associations can aid in predicting disease progression and understanding molecular changes. Comparing multiple models, including LSTM and KAN, this study aims to identify the method that delivers the highest metrics. The analysis reveals that KAN, with its dynamic learning capabilities, outperforms other approaches in predicting PD progression. This research highlights the potential of AI and machine learning in healthcare, paving the way for advanced computational models to enhance clinical predictions and improve patient care and treatment strategies in PD management.

LGNov 7, 2024
Inverse Transition Learning: Learning Dynamics from Demonstrations

Leo Benac, Abhishek Sharma, Sonali Parbhoo et al.

We consider the problem of estimating the transition dynamics $T^*$ from near-optimal expert trajectories in the context of offline model-based reinforcement learning. We develop a novel constraint-based method, Inverse Transition Learning, that treats the limited coverage of the expert trajectories as a \emph{feature}: we use the fact that the expert is near-optimal to inform our estimate of $T^*$. We integrate our constraints into a Bayesian approach. Across both synthetic environments and real healthcare scenarios like Intensive Care Unit (ICU) patient management in hypotension, we demonstrate not only significant improvements in decision-making, but that our posterior can inform when transfer will be successful.

IVFeb 12, 2024
Re-DiffiNet: Modeling discrepancies in tumor segmentation using diffusion models

Tianyi Ren, Abhishek Sharma, Juampablo Heras Rivera et al.

Identification of tumor margins is essential for surgical decision-making for glioblastoma patients and provides reliable assistance for neurosurgeons. Despite improvements in deep learning architectures for tumor segmentation over the years, creating a fully autonomous system suitable for clinical floors remains a formidable challenge because the model predictions have not yet reached the desired level of accuracy and generalizability for clinical applications. Generative modeling techniques have seen significant improvements in recent times. Specifically, Generative Adversarial Networks (GANs) and Denoising-diffusion-based models (DDPMs) have been used to generate higher-quality images with fewer artifacts and finer attributes. In this work, we introduce a framework called Re-Diffinet for modeling the discrepancy between the outputs of a segmentation model like U-Net and the ground truth, using DDPMs. By explicitly modeling the discrepancy, the results show an average improvement of 0.55\% in the Dice score and 16.28\% in HD95 from cross-validation over 5-folds, compared to the state-of-the-art U-Net segmentation model.

CLDec 19, 2023
Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud et al.

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.

CVJan 19, 2022
Real-time Recognition of Yoga Poses using computer Vision for Smart Health Care

Abhishek Sharma, Yash Shah, Yash Agrawal et al.

Nowadays, yoga has become a part of life for many people. Exercises and sports technological assistance is implemented in yoga pose identification. In this work, a self-assistance based yoga posture identification technique is developed, which helps users to perform Yoga with the correction feature in Real-time. The work also presents Yoga-hand mudra (hand gestures) identification. The YOGI dataset has been developed which include 10 Yoga postures with around 400-900 images of each pose and also contain 5 mudras for identification of mudras postures. It contains around 500 images of each mudra. The feature has been extracted by making a skeleton on the body for yoga poses and hand for mudra poses. Two different algorithms have been used for creating a skeleton one for yoga poses and the second for hand mudras. Angles of the joints have been extracted as a features for different machine learning and deep learning models. among all the models XGBoost with RandomSearch CV is most accurate and gives 99.2\% accuracy. The complete design framework is described in the present paper.

CVDec 5, 2021
Joint Symmetry Detection and Shape Matching for Non-Rigid Point Cloud

Abhishek Sharma, Maks Ovsjanikov

Despite the success of deep functional maps in non-rigid 3D shape matching, there exists no learning framework that models both self-symmetry and shape matching simultaneously. This is despite the fact that errors due to symmetry mismatch are a major challenge in non-rigid shape matching. In this paper, we propose a novel framework that simultaneously learns both self symmetry as well as a pairwise map between a pair of shapes. Our key idea is to couple a self symmetry map and a pairwise map through a regularization term that provides a joint constraint on both of them, thereby, leading to more accurate maps. We validate our method on several benchmarks where it outperforms many competitive baselines on both tasks.

CVDec 1, 2021
Object-Aware Cropping for Self-Supervised Learning

Shlok Mishra, Anshul Shah, Ankan Bansal et al.

A core component of the recent success of self-supervised learning is cropping data augmentation, which selects sub-regions of an image to be used as positive views in the self-supervised loss. The underlying assumption is that randomly cropped and resized regions of a given image share information about the objects of interest, which the learned representation will capture. This assumption is mostly satisfied in datasets such as ImageNet where there is a large, centered object, which is highly likely to be present in random crops of the full image. However, in other datasets such as OpenImages or COCO, which are more representative of real world uncurated data, there are typically multiple small objects in an image. In this work, we show that self-supervised learning based on the usual random cropping performs poorly on such datasets. We propose replacing one or both of the random crops with crops obtained from an object proposal algorithm. This encourages the model to learn both object and scene level semantic representations. Using this approach, which we call object-aware cropping, results in significant improvements over scene cropping on classification and object detection benchmarks. For example, on OpenImages, our approach achieves an improvement of 8.8% mAP over random scene-level cropping using MoCo-v2 based pre-training. We also show significant improvements on COCO and PASCAL-VOC object detection and segmentation tasks over the state-of-the-art self-supervised learning approaches. Our approach is efficient, simple and general, and can be used in most existing contrastive and non-contrastive self-supervised learning frameworks.

LGOct 25, 2021
On Learning Prediction-Focused Mixtures

Abhishek Sharma, Catherine Zeng, Sanjana Narayanan et al.

Probabilistic models help us encode latent structures that both model the data and are ideally also useful for specific downstream tasks. Among these, mixture models and their time-series counterparts, hidden Markov models, identify discrete components in the data. In this work, we focus on a constrained capacity setting, where we want to learn a model with relatively few components (e.g. for interpretability purposes). To maintain prediction performance, we introduce prediction-focused modeling for mixtures, which automatically selects the dimensions relevant to the prediction task. Our approach identifies relevant signal from the input, outperforms models that are not prediction-focused, and is easy to optimize; we also characterize when prediction-focused modeling can be expected to work.

CVOct 6, 2021
Learning Canonical Embedding for Non-rigid Shape Matching

Abhishek Sharma, Maks Ovsjanikov

This paper provides a novel framework that learns canonical embeddings for non-rigid shape matching. In contrast to prior work in this direction, our framework is trained end-to-end and thus avoids instabilities and constraints associated with the commonly-used Laplace-Beltrami basis or sequential optimization schemes. On multiple datasets, we demonstrate that learning self symmetry maps with a deep functional map projects 3D shapes into a low dimensional canonical embedding that facilitates non-rigid shape correspondence via a simple nearest neighbor search. Our framework outperforms multiple recent learning based methods on FAUST and SHREC benchmarks while being computationally cheaper, data-efficient, and robust.

CVFeb 10, 2021
Scale Normalized Image Pyramids with AutoFocus for Object Detection

Bharat Singh, Mahyar Najibi, Abhishek Sharma et al.

We present an efficient foveal framework to perform object detection. A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales. Such a restriction of objects' size during training affords better learning of object-sensitive filters, and therefore, results in better accuracy. However, the use of an image pyramid increases the computational cost. Hence, we propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects (as object locations are known during training). The resulting approach, referred to as Scale Normalized Image Pyramid with Efficient Resampling or SNIPER, yields up to 3 times speed-up during training. Unfortunately, as object locations are unknown during inference, the entire image pyramid still needs processing. To this end, we adopt a coarse-to-fine approach, and predict the locations and extent of object-like regions which will be processed in successive scales of the image pyramid. Intuitively, it's akin to our active human-vision that first skims over the field-of-view to spot interesting regions for further processing and only recognizes objects at the right resolution. The resulting algorithm is referred to as AutoFocus and results in a 2.5-5 times speed-up during inference when used with SNIP.

LGFeb 5, 2021
Matrix Decomposition on Graphs: A Functional View

Abhishek Sharma, Maks Ovsjanikov

We propose a functional view of matrix decomposition problems on graphs such as geometric matrix completion and graph regularized dimensionality reduction. Our unifying framework is based on the key idea that using a reduced basis to represent functions on the product space is sufficient to recover a low rank matrix approximation even from a sparse signal. We validate our framework on several real and synthetic benchmarks (for both problems) where it either outperforms state of the art or achieves competitive results at a fraction of the computational effort of prior work.

CVNov 3, 2020
Learning Visual Representations for Transfer Learning by Suppressing Texture

Shlok Mishra, Anshul Shah, Ankan Bansal et al.

Recent literature has shown that features obtained from supervised training of CNNs may over-emphasize texture rather than encoding high-level information. In self-supervised learning in particular, texture as a low-level cue may provide shortcuts that prevent the network from learning higher level representations. To address these problems we propose to use classic methods based on anisotropic diffusion to augment training using images with suppressed texture. This simple method helps retain important edge information and suppress texture at the same time. We empirically show that our method achieves state-of-the-art results on object detection and image classification with eight diverse datasets in either supervised or self-supervised learning tasks such as MoCoV2 and Jigsaw. Our method is particularly effective for transfer learning tasks and we observed improved performance on five standard transfer learning datasets. The large improvements (up to 11.49\%) on the Sketch-ImageNet dataset, DTD dataset and additional visual analyses with saliency maps suggest that our approach helps in learning better representations that better transfer.

CVJul 19, 2020
ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors

Rohun Tripathi, Vasu Singla, Mahyar Najibi et al.

The widely adopted sequential variant of Non Maximum Suppression (or Greedy-NMS) is a crucial module for object-detection pipelines. Unfortunately, for the region proposal stage of two/multi-stage detectors, NMS is turning out to be a latency bottleneck due to its sequential nature. In this article, we carefully profile Greedy-NMS iterations to find that a major chunk of computation is wasted in comparing proposals that are already far-away and have a small chance of suppressing each other. We address this issue by comparing only those proposals that are generated from nearby anchors. The translation-invariant property of the anchor lattice affords generation of a lookup table, which provides an efficient access to nearby proposals, during NMS. This leads to an Accelerated NMS algorithm which leverages Spatially Aware Priors, or ASAP-NMS, and improves the latency of the NMS step from 13.6ms to 1.2 ms on a CPU without sacrificing the accuracy of a state-of-the-art two-stage detector on COCO and VOC datasets. Importantly, ASAP-NMS is agnostic to image resolution and can be used as a simple drop-in module during inference. Using ASAP-NMS at run-time only, we obtain an mAP of 44.2\%@25Hz on the COCO dataset with a V100 GPU.

LGJul 10, 2020
Price Optimization in Fashion E-commerce

Sajan Kedia, Samyak Jain, Abhishek Sharma

With the rapid growth in the fashion e-commerce industry, it is becoming extremely challenging for the E-tailers to set an optimal price point for all the products on the platform. By establishing an optimal price point, they can maximize overall revenue and profit for the platform. In this paper, we propose a novel machine learning and optimization technique to find the optimal price point at an individual product level. It comprises three major components. Firstly, we use a demand prediction model to predict the next day demand for each product at a certain discount percentage. Next step, we use the concept of price elasticity of demand to get the multiple demand values by varying the discount percentage. Thus we obtain multiple price demand pairs for each product and we have to choose one of them for the live platform. Typically fashion e-commerce has millions of products, so there can be many permutations. Each permutation will assign a unique price point for all the products, which will sum up to a unique revenue number. To choose the best permutation which gives maximum revenue, a linear programming optimization technique is used. We have deployed the above methods in the live production environment and conducted several AB tests. According to the AB test result, our model is improving the revenue by 1 percent and gross margin by 0.81 percent.

CVMay 8, 2020
OpenEDS2020: Open Eyes Dataset

Cristina Palmero, Abhishek Sharma, Karsten Behrendt et al.

We present the second edition of OpenEDS dataset, OpenEDS2020, a novel dataset of eye-image sequences captured at a frame rate of 100 Hz under controlled illumination, using a virtual-reality head-mounted display mounted with two synchronized eye-facing cameras. The dataset, which is anonymized to remove any personally identifiable information on participants, consists of 80 participants of varied appearance performing several gaze-elicited tasks, and is divided in two subsets: 1) Gaze Prediction Dataset, with up to 66,560 sequences containing 550,400 eye-images and respective gaze vectors, created to foster research in spatio-temporal gaze estimation and prediction approaches; and 2) Eye Segmentation Dataset, consisting of 200 sequences sampled at 5 Hz, with up to 29,500 images, of which 5% contain a semantic segmentation label, devised to encourage the use of temporal information to propagate labels to contiguous frames. Baseline experiments have been evaluated on OpenEDS2020, one for each task, with average angular error of 5.37 degrees when performing gaze prediction on 1 to 5 frames into the future, and a mean intersection over union score of 84.1% for semantic segmentation. As its predecessor, OpenEDS dataset, we anticipate that this new dataset will continue creating opportunities to researchers in eye tracking, machine learning and computer vision communities, to advance the state of the art for virtual reality applications. The dataset is available for download upon request at http://research.fb.com/programs/openeds-2020-challenge/.

LGFeb 13, 2020
Listwise Learning to Rank with Deep Q-Networks

Abhishek Sharma

Learning to Rank is the problem involved with ranking a sequence of documents based on their relevance to a given query. Deep Q-Learning has been shown to be a useful method for training an agent in sequential decision making. In this paper, we show that DeepQRank, our deep q-learning to rank agent, demonstrates performance that can be considered state-of-the-art. Though less computationally efficient than a supervised learning approach such as linear regression, our agent has fewer limitations in terms of which format of data it can use for training and evaluation. We run our algorithm against Microsoft's LETOR listwise dataset and achieve an NDCG@1 (ranking accuracy in the range [0,1]) of 0.5075, narrowly beating out the leading supervised learning model, SVMRank (0.4958).

SEDec 16, 2019
Analyzing Offline Social Engagements: An Empirical Study of Meetup Events Related to Software Development

Abhishek Sharma, Gede Artha Azriadi Prana, Anamika Sawhney et al.

Software developers use a variety of social media channels and tools in order to keep themselves up to date, collaborate with other developers, and find projects to contribute to. Meetup is one of such social media used by software developers to organize community gatherings. Liu et al. characterized Meetup as an event-based social network (EBSN) which contains valuable offline social interactions in addition to online interactions. Recently, Storey et al. found out that Meetup was one of the social channels used by developers. We in this work investigate in detail the dynamics of Meetup groups and events related to software development, which has not been done in any of the previous works. First, we identified 6,317 Meetup groups related to software development and extracted 185,758 events organized by them. Then we took a statistically significant sample of 452 events on which we performed open coding, based on which we were able to develop 9 categories of events (8 main categories + Others). Next, we did a popularity analysis of the categories of events and found that Talks by Domain Experts, Hands-on Sessions, and Open Discussions are the most popular categories of events organized by Meetup groups related to software development. Our findings show that more popular categories are those where developers can learn and gain knowledge. On doing a diversity analysis of Meetup groups we found 19.82% of the members on an average are female, which is a larger proportion as compared to numbers reported in previous studies on other social media. From a broader software development community point of view information from this new forum can be valuable to identify and understand emerging topics and associations among them which can be helpful to identify future trends as well as current best practices.

LGOct 15, 2019
Reinforcement learning with a network of spiking agents

Sneha Aenugu, Abhishek Sharma, Sasikiran Yelamarthi et al.

Neuroscientific theory suggests that dopaminergic neurons broadcast global reward prediction errors to large areas of the brain influencing the synaptic plasticity of the neurons in those regions. We build on this theory to propose a multi-agent learning framework with spiking neurons in the generalized linear model (GLM) formulation as agents, to solve reinforcement learning (RL) tasks. We show that a network of GLM spiking agents connected in a hierarchical fashion, where each spiking agent modulates its firing policy based on local information and a global prediction error, can learn complex action representations to solve RL tasks. We further show how leveraging principles of modularity and population coding inspired from the brain can help reduce variance in the learning updates making it a viable optimization technique.

CVSep 24, 2019
Multi-Person 3D Human Pose Estimation from Monocular Images

Rishabh Dabral, Nitesh B Gundavarapu, Rahul Mitra et al.

Multi-person 3D human pose estimation from a single image is a challenging problem, especially for in-the-wild settings due to the lack of 3D annotated data. We propose HG-RCNN, a Mask-RCNN based network that also leverages the benefits of the Hourglass architecture for multi-person 3D Human Pose Estimation. A two-staged approach is presented that first estimates the 2D keypoints in every Region of Interest (RoI) and then lifts the estimated keypoints to 3D. Finally, the estimated 3D poses are placed in camera-coordinates using weak-perspective projection assumption and joint optimization of focal length and root translations. The result is a simple and modular network for multi-person 3D human pose estimation that does not require any multi-person 3D pose dataset. Despite its simple formulation, HG-RCNN achieves the state-of-the-art results on MuPoTS-3D while also approximating the 3D pose in the camera-coordinate system.

CLSep 9, 2019
Neural Conversational QA: Learning to Reason v.s. Exploiting Patterns

Nikhil Verma, Abhishek Sharma, Dhiraj Madan et al.

Neural Conversational QA tasks like ShARC require systems to answer questions based on the contents of a given passage. On studying recent state-of-the-art models on the ShARCQA task, we found indications that the models learn spurious clues/patterns in the dataset. Furthermore, we show that a heuristic-based program designed to exploit these patterns can have performance comparable to that of the neural models. In this paper we share our findings about four types of patterns found in the ShARC corpus and describe how neural models exploit them. Motivated by the aforementioned findings, we create and share a modified dataset that has fewer spurious patterns, consequently allowing models to learn better.