Ankit Gupta

LG
h-index83
31papers
6,156citations
Novelty43%
AI Score57

31 Papers

LGMar 27, 2022
Diagonal State Spaces are as Effective as Structured State Spaces

Ankit Gupta, Albert Gu, Jonathan Berant · deepmind, ibm-research

Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video. While attention-based models are a popular and effective choice in modeling short-range interactions, their performance on tasks requiring long range reasoning has been largely inadequate. In an exciting result, Gu et al. (ICLR 2022) proposed the $\textit{Structured State Space}$ (S4) architecture delivering large gains over state-of-the-art models on several long-range tasks across various modalities. The core proposition of S4 is the parameterization of state matrices via a diagonal plus low rank structure, allowing efficient computation. In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal. Our $\textit{Diagonal State Space}$ (DSS) model matches the performance of S4 on Long Range Arena tasks, speech classification on Speech Commands dataset, while being conceptually simpler and straightforward to implement.

CLSep 6, 2022
Analyzing Transformers in Embedding Space

Guy Dar, Mor Geva, Ankit Gupta et al. · deepmind, ibm-research

Understanding Transformer-based models has attracted significant attention, as they lie at the heart of recent technological advances across machine learning. While most interpretability methods rely on running models over inputs, recent work has shown that a zero-pass approach, where parameters are interpreted directly without a forward/backward pass is feasible for some Transformer parameters, and for two-layer attention networks. In this work, we present a theoretical analysis where all parameters of a trained Transformer are interpreted by projecting them into the embedding space, that is, the space of vocabulary items they operate on. We derive a simple theoretical framework to support our arguments and provide ample evidence for its validity. First, an empirical analysis showing that parameters of both pretrained and fine-tuned models can be interpreted in embedding space. Second, we present two applications of our framework: (a) aligning the parameters of different models that share a vocabulary, and (b) constructing a classifier without training by ``translating'' the parameters of a fine-tuned classifier to parameters of a different model that was only pretrained. Overall, our findings open the door to interpretation methods that, at least in part, abstract away from model specifics and operate in the embedding space only.

LGJun 23, 2022
On the Parameterization and Initialization of Diagonal State Space Models

Albert Gu, Ankit Gupta, Karan Goel et al. · ibm-research

State space models (SSM) have recently been shown to be very effective as a deep learning layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers. The first version to show this potential was the S4 model, which is particularly effective on tasks involving long-range dependencies by using a prescribed state matrix called the HiPPO matrix. While this has an interpretable mathematical mechanism for modeling long dependencies, it introduces a custom representation and algorithm that can be difficult to implement. On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix. This work seeks to systematically understand how to parameterize and initialize such diagonal state space models. While it follows from classical results that almost all SSMs have an equivalent diagonal form, we show that the initialization is critical for performance. We explain why DSS works mathematically, by showing that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension. We also systematically describe various design choices in parameterizing and computing diagonal SSMs, and perform a controlled empirical study ablating the effects of these choices. Our final model S4D is a simple diagonal version of S4 whose kernel computation requires just 2 lines of code and performs comparably to S4 in almost all settings, with state-of-the-art results for image, audio, and medical time-series domains, and averaging 85\% on the Long Range Arena benchmark.

LGOct 4, 2023
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors

Ido Amos, Jonathan Berant, Ankit Gupta · deepmind, ibm-research

Modeling long-range dependencies across sequences is a longstanding goal in machine learning and has led to architectures, such as state space models, that dramatically outperform Transformers on long sequences. However, these impressive empirical gains have been by and large demonstrated on benchmarks (e.g. Long Range Arena), where models are randomly initialized and trained to predict a target label from an input sequence. In this work, we show that random initialization leads to gross overestimation of the differences between architectures and that pretraining with standard denoising objectives, using $\textit{only the downstream task data}$, leads to dramatic gains across multiple architectures and to very small gaps between Transformers and state space models (SSMs). In stark contrast to prior works, we find vanilla Transformers to match the performance of S4 on Long Range Arena when properly pretrained, and we improve the best reported results of SSMs on the PathX-256 task by 20 absolute points. Subsequently, we analyze the utility of previously-proposed structured parameterizations for SSMs and show they become mostly redundant in the presence of data-driven initialization obtained through pretraining. Our work shows that, when evaluating different architectures on supervised tasks, incorporation of data-driven priors via pretraining is essential for reliable performance estimation, and can be done efficiently.

LGJun 27, 2022
Long Range Language Modeling via Gated State Spaces

Harsh Mehta, Ankit Gupta, Ashok Cutkosky et al. · ibm-research

State space models have shown to be effective at modeling long range dependencies, specially on sequence classification tasks. In this work we focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. Based on recent developments around the effectiveness of gated activation functions, we propose a new layer named Gated State Space (GSS) and show that it trains significantly faster than the diagonal version of S4 (i.e. DSS) on TPUs, is fairly competitive with several well-tuned Transformer-based baselines and exhibits zero-shot generalization to longer inputs while being straightforward to implement. Finally, we show that leveraging self-attention to model local dependencies improves the performance of GSS even further.

LGDec 1, 2022
Simplifying and Understanding State Space Models with Diagonal Linear RNNs

Ankit Gupta, Harsh Mehta, Jonathan Berant · deepmind, ibm-research

Sequence models based on linear state spaces (SSMs) have recently emerged as a promising choice of architecture for modeling long range dependencies across various modalities. However, they invariably rely on discretization of a continuous state space, which complicates their presentation and understanding. In this work, we dispose of the discretization step, and propose a model based on vanilla Diagonal Linear RNNs ($\mathrm{DLR}$). We empirically show that, despite being conceptually much simpler, $\mathrm{DLR}$ is as performant as previously-proposed SSMs on a variety of tasks and benchmarks including Long Range Arena and raw speech classification. Moreover, we characterize the expressivity of SSMs (including $\mathrm{DLR}$) and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks involving interactions over tens of thousands of tokens, ranging from simple operations, such as shifting an input sequence, to detecting co-dependent visual features over long spatial ranges in flattened images. We find that while SSMs report near-perfect performance on tasks that can be modeled via $\textit{few}$ convolutional kernels, they struggle on tasks requiring $\textit{many}$ such kernels and especially when the desired sequence manipulation is $\textit{context-dependent}$. Despite these limitations, $\mathrm{DLR}$ reaches high performance on two higher-order reasoning tasks $\mathrm{ListOpsSubTrees}$ and $\mathrm{PathfinderSegmentation}\text{-}\mathrm{256}$ with input lengths $8K$ and $65K$ respectively, and gives encouraging performance on $\mathrm{PathfinderSegmentation}\text{-}\mathrm{512}$ with input length $262K$ for which attention is not a viable choice.

QMSep 18, 2024
How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

Charlotte Bunne, Yusuf Roohani, Yanay Rosen et al.

The cell is arguably the most fundamental unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision of leveraging advances in AI to construct virtual cells, high-fidelity simulations of cells and cellular systems under different conditions that are directly learned from biological data across measurements and scales. We discuss desired capabilities of such AI Virtual Cells, including generating universal representations of biological entities across scales, and facilitating interpretable in silico experiments to predict and understand their behavior using virtual instruments. We further address the challenges, opportunities and requirements to realize this vision including data needs, evaluation strategies, and community standards and engagement to ensure biological accuracy and broad utility. We envision a future where AI Virtual Cells help identify new drug targets, predict cellular responses to perturbations, as well as scale hypothesis exploration. With open science collaborations across the biomedical ecosystem that includes academia, philanthropy, and the biopharma and AI industries, a comprehensive predictive understanding of cell mechanisms and interactions has come into reach.

OCNov 22, 2017
Variance reduction for antithetic integral control of stochastic reaction networks

Corentin Briat, Ankit Gupta, Mustafa Khammash

The antithetic integral feedback motif recently introduced in Briat, Gupta & Khammash (Cell Systems, 2017) is known to ensure robust perfect adaptation for the mean dynamics of a given molecular species involved in a complex stochastic biomolecular reaction network. However, it was observed that it also leads to a higher variance in the controlled network than that obtained when using a constitutive (i.e. open-loop) control strategy. This was interpreted as the cost of the adaptation property and may be viewed as a performance deterioration for the overall controlled network. To decrease this variance and improve the performance, we propose to combine the antithetic integral feedback motif with a negative feedback strategy. Both theoretical and numerical results are obtained. The theoretical ones are based on a tailored moment closure method allowing one to obtain approximate expressions for the stationary variance for the controlled network and predict that the variance can indeed be decreased by increasing the strength of the negative feedback. Numerical results verify the accuracy of this approximation and show that the controlled species variance can indeed be decreased, sometimes below its constitutive level. Three molecular networks are considered in order to verify the wide applicability of two types of negative feedback strategies. The main conclusion is that there is a trade-off between the speed of the settling-time of the mean trajectories and the stationary variance of the controlled species; i.e. smaller variance is associated with larger settling-time.

OCJan 13, 2016
Antithetic Integral Feedback ensures robust perfect adaptation in noisy biomolecular networks

Corentin Briat, Ankit Gupta, Mustafa Khammash

Homeostasis is a running theme in biology. Often achieved through feedback regulation strategies, homeostasis allows living cells to control their internal environment as a means for surviving changing and unfavourable environments. While many endogenous homeostatic motifs have been studied in living cells, some other motifs may remain under-explored or even undiscovered. At the same time, known regulatory motifs have been mostly analyzed at the deterministic level, and the effect of noise on their regulatory function has received low attention. Here we lay the foundation for a regulation theory at the molecular level that explicitly takes into account the noisy nature of biochemical reactions and provides novel tools for the analysis and design of robust homeostatic circuits. Using these ideas, we propose a new regulation motif, which we refer to as {\em antithetic integral feedback, and demonstrate its effectiveness as a strategy for generically regulating a wide class of reaction networks. By combining tools from probability and control theory, we show that the proposed motif preserves the stability of the overall network, steers the population of any regulated species to a desired set point, and achieves robust perfect adaptation -- all with low prior knowledge of reaction rates. Moreover, our proposed regulatory motif can be implemented using a very small number of molecules and hence has a negligible metabolic load. Strikingly, the regulatory motif exploits stochastic noise, leading to enhanced regulation in scenarios where noise-free implementations result in dysregulation. Finally, we discuss the possible manifestation of the proposed antithetic integral feedback motif in endogenous biological circuits and its realization in synthetic circuits.

CVJun 26, 2023
Efficient High-Resolution Template Matching with Vector Quantized Nearest Neighbour Fields

Ankit Gupta, Ida-Maria Sintorn

Template matching is a fundamental problem in computer vision with applications in fields including object detection, image registration, and object tracking. Current methods rely on nearest-neighbour (NN) matching, where the query feature space is converted to NN space by representing each query pixel with its NN in the template. NN-based methods have been shown to perform better in occlusions, appearance changes, and non-rigid transformations; however, they scale poorly with high-resolution data and high feature dimensions. We present an NN-based method which efficiently reduces the NN computations and introduces filtering in the NN fields (NNFs). A vector quantization step is introduced before the NN calculation to represent the template with $k$ features, and the filter response over the NNFs is used to compare the template and query distributions over the features. We show that state-of-the-art performance is achieved in low-resolution data, and our method outperforms previous methods at higher resolution.

CVOct 20, 2022
Towards Better Guided Attention and Human Knowledge Insertion in Deep Convolutional Neural Networks

Ankit Gupta, Ida-Maria Sintorn

Attention Branch Networks (ABNs) have been shown to simultaneously provide visual explanation and improve the performance of deep convolutional neural networks (CNNs). In this work, we introduce Multi-Scale Attention Branch Networks (MSABN), which enhance the resolution of the generated attention maps, and improve the performance. We evaluate MSABN on benchmark image recognition and fine-grained recognition datasets where we observe MSABN outperforms ABN and baseline models. We also introduce a new data augmentation strategy utilizing the attention maps to incorporate human knowledge in the form of bounding box annotations of the objects of interest. We show that even with a limited number of edited samples, a significant performance gain can be achieved with this strategy.

CLJan 31, 2024Code
Exploring the limits of decoder-only models trained on public speech recognition corpora

Ankit Gupta, George Saon, Brian Kingsbury · ibm-research

The emergence of industrial-scale speech recognition (ASR) models such as Whisper and USM, trained on 1M hours of weakly labelled and 12M hours of audio only proprietary data respectively, has led to a stronger need for large scale public ASR corpora and competitive open source pipelines. Unlike the said models, large language models are typically based on Transformer decoders, and it remains unclear if decoder-only models trained on public data alone can deliver competitive performance. In this work, we investigate factors such as choice of training datasets and modeling components necessary for obtaining the best performance using public English ASR corpora alone. Our Decoder-Only Transformer for ASR (DOTA) model comprehensively outperforms the encoder-decoder open source replication of Whisper (OWSM) on nearly all English ASR benchmarks and outperforms Whisper large-v3 on 7 out of 15 test sets. We release our codebase and model checkpoints under permissive license.

GEO-PHMay 14
Three dimensional simulation of fluid-driven frictional and tensile ruptures on existing discontinuities

Brice Lecampion, Sylvain Brisson, Antareep Sarma et al.

We present an implicit, fully-coupled hydro-mechanical solver for the three dimensional simulation of fluid-driven rupture propagation along existing discontinuities. The solver handles simultaneously frictional slip (shear failure) and tensile opening (hydraulic fracture) along arbitrary intersecting fractures and faults in a linearly elastic and impermeable rock matrix. The spatial discretization combines a collocation displacement discontinuity boundary element method for quasi-static elasticity with a Galerkin finite element method for nonlinear pore-fluid diffusion along the discontinuities. Frictional and tensile failure are governed by a poro-elastoplastic cohesive zone like interface law with slip-weakening friction, dilatancy, and tensile strength degradation, integrated via an elastic predictor-plastic corrector scheme. The strong nonlinear coupling between mechanical deformation and fracture permeability is handled via adaptive implicit time-stepping. Efficient block preconditioning of the coupled tangent system, leveraging hierarchical matrix representations of the boundary element operator, is essential to achieve robustness across the full range of fracture behaviors. Accuracy and convergence are demonstrated against a comprehensive suite of analytical and semi-analytical solutions of increasing complexity: fluid-driven frictional ruptures under constant and slip-weakening friction, dilatant ruptures with permeability changes, and penny shaped hydraulic fractures spanning the viscosity-to-toughness transition. The solver is further assessed on two multi-fracture configurations: injection into three intersecting fractures, and a height-confined hydraulic fracture intersecting a strike-slip fault. The proposed framework simultaneously captures frictional slip, dilatancy, permeability evolution, and tensile opening.

CLJul 6, 2020Code
DART: Open-Domain Structured Data Record to Text Generation

Linyong Nan, Dragomir Radev, Rui Zhang et al.

We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploiting the semantic dependencies among table headers and the table title. Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks by utilizing techniques such as: tree ontology annotation, question-answer pair to declarative sentence conversion, and predicate unification, all with minimum post-editing. We present systematic evaluation on DART as well as new state-of-the-art results on WebNLG 2017 to show that DART (1) poses new challenges to existing data-to-text datasets and (2) facilitates out-of-domain generalization. Our data and code can be found at https://github.com/Yale-LILY/dart.

LGApr 13
Can AI Detect Life? Lessons from Artificial Life

Ankit Gupta, Christoph Adami

Modern machine learning methods have been proposed to detect life in extraterrestrial samples, drawing on their ability to distinguish biotic from abiotic samples based on training models using natural and synthetic organic molecular mixtures. Here we show using Artificial Life that such methods are easily fooled into detecting life with near 100% confidence even if the analyzed sample is not capable of life. This is due to modern machine learning methods' propensity to be easily fooled by out-of-distribution samples. Because extra-terrestrial samples are very likely out of the distribution provided by terrestrial biotic and abiotic samples, using AI methods for life detection is bound to yield significant false positives.

CVMay 3
Video Active Perception: Effective Inference-Time Long-Form Video Understanding with Vision-Language Models

Martin Q. Ma, Willis Guo, Aditya Agrawal et al.

Large vision-language models (VLMs) have advanced multimodal tasks such as video question answering (QA). However, VLMs face the challenge of selecting frames effectively and efficiently, as standard uniform sampling is expensive and performance may plateau. Inspired by active perception theory, which posits that models gain information by acquiring data that differs from their expectations, we introduce Video Active Perception (VAP), a training-free method to enhance long-form video QA using VLMs. Our approach treats keyframe selection as data acquisition in active perception and leverages a lightweight text-conditioned video generation model to represent prior world knowledge. Empirically, VAP achieves state-of-the-art zero-shot results on long-form or reasoning video QA datasets such as EgoSchema, NExT-QA, ActivityNet-QA, IntentQA, and CLEVRER, achieving an increase of up to 5.6 x frame efficiency by frames per question over standard GPT-4o, Gemini 1.5 Pro, and LLaVA-OV. Moreover, VAP shows stronger reasoning abilities than previous methods and effectively selects keyframes relevant to questions. These findings highlight the potential of leveraging active perception to improve the frame effectiveness and efficiency of long-form video QA.

CVDec 5, 2025
SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling

Ankit Gupta, Christoph Adami, Emily Dolson

Deep neural networks (DNNs) excel across image recognition tasks, yet continue to exhibit overconfidence on inputs that bear no resemblance to natural images. Revisiting the "fooling images" work introduced by Nguyen et al. (2015), we re-implement both CPPN-based and direct-encoding-based evolutionary fooling attacks on modern architectures, including convolutional and transformer classifiers. Our re-implementation confirm that high-confidence fooling persists even in state-of-the-art networks, with transformer-based ViT-B/16 emerging as the most susceptible--achieving near-certain misclassifications with substantially fewer queries than convolution-based models. We then introduce SPOOF, a minimalist, consistent, and more efficient black-box attack generating high-confidence fooling images. Despite its simplicity, SPOOF generates unrecognizable fooling images with minimal pixel modifications and drastically reduced compute. Furthermore, retraining with fooling images as an additional class provides only partial resistance, as SPOOF continues to fool consistently with slightly higher query budgets--highlighting persistent fragility of modern deep classifiers.

LGJul 14, 2025
Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop

Elizabeth Fahsbender, Alma Andersson, Jeremy Ash et al.

Artificial intelligence holds immense promise for transforming biology, yet a lack of standardized, cross domain, benchmarks undermines our ability to build robust, trustworthy models. Here, we present insights from a recent workshop that convened machine learning and computational biology experts across imaging, transcriptomics, proteomics, and genomics to tackle this gap. We identify major technical and systemic bottlenecks such as data heterogeneity and noise, reproducibility challenges, biases, and the fragmented ecosystem of publicly available resources and propose a set of recommendations for building benchmarking frameworks that can efficiently compare ML models of biological systems across tasks and data modalities. By promoting high quality data curation, standardized tooling, comprehensive evaluation metrics, and open, collaborative platforms, we aim to accelerate the development of robust benchmarks for AI driven Virtual Cells. These benchmarks are crucial for ensuring rigor, reproducibility, and biological relevance, and will ultimately advance the field toward integrated models that drive new discoveries, therapeutic insights, and a deeper understanding of cellular systems.

CLJan 10, 2022
SCROLLS: Standardized CompaRison Over Long Language Sequences

Uri Shaham, Elad Segal, Maor Ivgi et al.

NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing information across the input. SCROLLS contains summarization, question answering, and natural language inference tasks, covering multiple domains, including literature, science, business, and entertainment. Initial baselines, including Longformer Encoder-Decoder, indicate that there is ample room for improvement on SCROLLS. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.

CLJun 13, 2021
Memory-efficient Transformers via Top-$k$ Attention

Ankit Gupta, Guy Dar, Shaya Goodman et al.

Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. While these variants are memory and compute efficient, it is not possible to directly use them with popular pre-trained language models trained using vanilla attention, without an expensive corrective pre-training stage. In this work, we propose a simple yet highly accurate approximation for vanilla attention. We process the queries in chunks, and for each query, compute the top-$k$ scores with respect to the keys. Our approach offers several advantages: (a) its memory usage is linear in the input size, similar to linear attention variants, such as Performer and RFA (b) it is a drop-in replacement for vanilla attention that does not require any corrective pre-training, and (c) it can also lead to significant memory savings in the feed-forward layers after casting them into the familiar query-key-value framework. We evaluate the quality of top-$k$ approximation for multi-head attention layers on the Long Range Arena Benchmark, and for feed-forward layers of T5 and UnifiedQA on multiple QA datasets. We show our approach leads to accuracy that is nearly-identical to vanilla attention in multiple setups including training from scratch, fine-tuning, and zero-shot inference.

LGMar 17, 2021
Value-aware Approximate Attention

Ankit Gupta, Jonathan Berant

Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. However, all approximations thus far have ignored the contribution of the $\textit{value vectors}$ to the quality of approximation. In this work, we argue that research efforts should be directed towards approximating the true output of the attention sub-layer, which includes the value vectors. We propose a value-aware objective, and show theoretically and empirically that an optimal approximation of a value-aware objective substantially outperforms an optimal approximation that ignores values, in the context of language modeling. Moreover, we show that the choice of kernel function for computing attention similarity can substantially affect the quality of sparse approximations, where kernel functions that are less skewed are more affected by the value vectors.

LGJun 5, 2020
GMAT: Global Memory Augmentation for Transformers

Ankit Gupta, Jonathan Berant

Transformer-based models have become ubiquitous in natural language processing thanks to their large capacity, innate parallelism and high performance. The contextualizing component of a Transformer block is the $\textit{pairwise dot-product}$ attention that has a large $Ω(L^2)$ memory requirement for length $L$ sequences, limiting its ability to process long documents. This has been the subject of substantial interest recently, where multiple approximations were proposed to reduce the quadratic memory requirement using sparse attention matrices. In this work, we propose to augment sparse Transformer blocks with a dense attention-based $\textit{global memory}$ of length $M$ ($\ll L$) which provides an aggregate global view of the entire input sequence to each position. Our augmentation has a manageable $O(M\cdot(L+M))$ memory overhead, and can be seamlessly integrated with prior sparse solutions. Moreover, global memory can also be used for sequence compression, by representing a long input sequence with the memory representations only. We empirically show that our method leads to substantial improvement on a range of tasks, including (a) synthetic tasks that require global reasoning, (b) masked language modeling, and (c) reading comprehension.

CLApr 9, 2020
Injecting Numerical Reasoning Skills into Language Models

Mor Geva, Ankit Gupta, Jonathan Berant

Large pre-trained language models (LMs) are known to encode substantial amounts of linguistic information. However, high-level reasoning skills, such as numerical reasoning, are difficult to learn from a language-modeling objective only. Consequently, existing models for numerical reasoning have used specialized architectures with limited flexibility. In this work, we show that numerical reasoning is amenable to automatic data generation, and thus one can inject this skill into pre-trained LMs, by generating large amounts of data, and training in a multi-task setup. We show that pre-training our model, GenBERT, on this data, dramatically improves performance on DROP (49.3 $\rightarrow$ 72.3 F1), reaching performance that matches state-of-the-art models of comparable size, while using a simple and general-purpose encoder-decoder architecture. Moreover, GenBERT generalizes well to math word problem datasets, while maintaining high performance on standard RC tasks. Our approach provides a general recipe for injecting skills into large pre-trained LMs, whenever the skill is amenable to automatic data augmentation.

CLJan 31, 2020
Break It Down: A Question Understanding Benchmark

Tomer Wolfson, Mor Geva, Ankit Gupta et al.

Understanding natural language questions entails the ability to break down a question into the requisite steps for computing its answer. In this work, we introduce a Question Decomposition Meaning Representation (QDMR) for questions. QDMR constitutes the ordered list of steps, expressed through natural language, that are necessary for answering a question. We develop a crowdsourcing pipeline, showing that quality QDMRs can be annotated at scale, and release the Break dataset, containing over 83K pairs of questions and their QDMRs. We demonstrate the utility of QDMR by showing that (a) it can be used to improve open-domain question answering on the HotpotQA dataset, (b) it can be deterministically converted to a pseudo-SQL formal language, which can alleviate annotation in semantic parsing applications. Last, we use Break to train a sequence-to-sequence model with copying that parses questions into QDMR structures, and show that it substantially outperforms several natural baselines.

SPJul 24, 2019
HeartFit: An Accurate Platform for Heart Murmur Diagnosis Utilizing Deep Learning

Ankit Gupta, George Tang, Sylesh Suresh

Cardiovascular disease (CD) is the number one leading cause of death worldwide, accounting for more than 17 million deaths in 2015. Critical indicators of CD include heart murmurs, intense sounds emitted by the heart during periods of irregular blood flow. Current diagnosis of heart murmurs relies on echocardiography (ECHO), which costs thousands of dollars and medical professionals to analyze the results, making it very unsuitable for areas with inadequate medical facilities. Thus, there is a need for an accessible alternative. Based on a simple interface and deep learning, HeartFit allows users to administer diagnoses themselves. An inexpensive, custom designed stethoscope in conjunction with a mobile application allows users to record and upload audio of their heart to a database. Using a deep learning network architecture, the database classifies the audio and returns the diagnosis to the user. The model consists of a deep recurrent convolutional neural network trained on 300 prelabeled heartbeat audio samples. After the model was validated on a previously unseen set of 100 heartbeat audio samples, it achieved a f beta score of 0.9545 and an accuracy of 95.5 percent. This value exceeds that of clinical examination accuracy, which is around 83 percent to 91 percent and costs orders of magnitude less than ECHO, demonstrating the effectiveness of the HeartFit platform. Through the platform, users can obtain immediate, accurate diagnosis of heart murmurs without any professional medical assistance, revolutionizing how we combat CD.

CRJul 23, 2019
CAMLPAD: Cybersecurity Autonomous Machine Learning Platform for Anomaly Detection

Ayush Hariharan, Ankit Gupta, Trisha Pal

As machine learning and cybersecurity continue to explode in the context of the digital ecosystem, the complexity of cybersecurity data combined with complicated and evasive machine learning algorithms leads to vast difficulties in designing an end to end system for intelligent, automatic anomaly classification. On the other hand, traditional systems use elementary statistics techniques and are often inaccurate, leading to weak centralized data analysis platforms. In this paper, we propose a novel system that addresses these two problems, titled CAMLPAD, for Cybersecurity Autonomous Machine Learning Platform for Anomaly Detection. The CAMLPAD systems streamlined, holistic approach begins with retrieving a multitude of different species of cybersecurity data in real time using elasticsearch, then running several machine learning algorithms, namely Isolation Forest, Histogram Based Outlier Score (HBOS), Cluster Based Local Outlier Factor (CBLOF), and K Means Clustering, to process the data. Next, the calculated anomalies are visualized using Kibana and are assigned an outlier score, which serves as an indicator for whether an alert should be sent to the system administrator that there are potential anomalies in the network. After comprehensive testing of our platform in a simulated environment, the CAMLPAD system achieved an adjusted rand score of 95 percent, exhibiting the reliable accuracy and precision of the system. All in all, the CAMLPAD system provides an accurate, streamlined approach to real time cybersecurity anomaly detection, delivering a novel solution that has the potential to revolutionize the cybersecurity sector.

HCJul 18, 2019
User-Interactive Machine Learning Model for Identifying Structural Relationships of Code Features

Ankit Gupta

Traditional machine learning based intelligent systems assist users by learning patterns in data and making recommendations. However, these systems are limited in that the user has little means of understanding the rationale behind the systems suggestions, communicating their own understanding of patterns, or correcting system behavior. In this project, we outline a model for intelligent software based on a human computer feedback loop. The Machine Learning (ML) systems recommendations are reviewed by the user, and in turn, this information shapes the systems decision making. Our model was applied to developing an HTML editor that integrates ML with user interaction to ascertain structural relationships between HTML document features and apply them for code completion. The editor utilizes the ID3 algorithm to build decision trees, sequences of rules for predicting code the user will type. The editor displays the decision trees rules in the Interactive Rules Interface System (IRIS), which allows developers to prioritize, modify, or delete them. These interactions alter the data processed by ID3, providing the developer some control over the autocomplete system. Validation indicates that, absent user interaction, the ML model is able to predict tags with 78.4 percent accuracy, attributes with 62.9 percent accuracy, and values with 12.8 percent accuracy. Based off of the results of the user study, user interaction with the rules interface corrects feature relationships missed or mistaken by the automated process, enhancing autocomplete accuracy and developer productivity. Additionally, interaction is proven to help developers work with greater awareness of code patterns. Our research demonstrates the viability of a software integration of machine intelligence with human feedback.

LGJul 17, 2019
AquaSight: Automatic Water Impurity Detection Utilizing Convolutional Neural Networks

Ankit Gupta, Elliott Ruebush

According to the United Nations World Water Assessment Programme, every day, 2 million tons of sewage and industrial and agricultural waste are discharged into the worlds water. In order to address this pervasive issue of increasing water pollution, while ensuring that the global population has an efficient, accurate, and low cost method to assess whether the water they drink is contaminated, we propose AquaSight, a novel mobile application that utilizes deep learning methods, specifically Convolutional Neural Networks, for automated water impurity detection. After comprehensive training with a dataset of 105 images representing varying magnitudes of contamination, the deep learning algorithm achieved a 96 percent accuracy and loss of 0.108. Furthermore, the machine learning model uses efficient analysis of the turbidity and transparency levels of water to estimate a particular sample of waters level of contamination. When deployed, the AquaSight system will provide an efficient way for individuals to secure an estimation of water quality, alerting local and national government to take action and potentially saving millions of lives worldwide.

CVJul 9, 2019
StrokeSave: A Novel, High-Performance Mobile Application for Stroke Diagnosis using Deep Learning and Computer Vision

Ankit Gupta

According to the WHO, Cerebrovascular Stroke, or CS, is the second largest cause of death worldwide. Current diagnosis of CS relies on labor and cost intensive neuroimaging techniques, unsuitable for areas with inadequate access to quality medical facilities. Thus, there is a great need for an efficient diagnosis alternative. StrokeSave is a platform for users to self-diagnose for prevalence to stroke. The mobile app is continuously updated with heart rate, blood pressure, and blood oxygen data from sensors on the patient wrist. Once these measurements reach a threshold for possible stroke, the patient takes facial images and vocal recordings to screen for paralysis attributed to stroke. A custom designed lens attached to a phone's camera then takes retinal images for the deep learning model to classify based on presence of retinopathy and sends a comprehensive diagnosis. The deep learning model, which consists of a RNN trained on 100 voice slurred audio files, a SVM trained on 410 vascular data points, and a CNN trained on 520 retinopathy images, achieved a holistic accuracy of 95.0 percent when validated on 327 samples. This value exceeds that of clinical examination accuracy, which is around 40 to 89 percent, further demonstrating the vital utility of such a medical device. Through this automated platform, users receive efficient, highly accurate diagnosis without professional medical assistance, revolutionizing medical diagnosis of CS and potentially saving millions of lives.

GNOct 3, 2017
Dilated Convolutions for Modeling Long-Distance Genomic Dependencies

Ankit Gupta, Alexander M. Rush

We consider the task of detecting regulatory elements in the human genome directly from raw DNA. Past work has focused on small snippets of DNA, making it difficult to model long-distance dependencies that arise from DNA's 3-dimensional conformation. In order to study long-distance dependencies, we develop and release a novel dataset for a larger-context modeling task. Using this new data set we model long-distance interactions using dilated convolutional neural networks, and compare them to standard convolutions and recurrent neural networks. We show that dilated convolutions are effective at modeling the locations of regulatory markers in the human genome, such as transcription factor binding sites, histone modifications, and DNAse hypersensitivity sites.

NAMar 23, 2015
A Comparative Analysis of Tensor Decomposition Models Using Hyper Spectral Image

Ankit Gupta, Ashish Oberoi

Hyper spectral imaging is a remote sensing technology, providing variety of applications such as material identification, space object identification, planetary exploitation etc. It deals with capturing continuum of images of the earth surface from different angles. Due to the multidimensional nature of the image, multi-way arrays are one of the possible solutions for analyzing hyper spectral data. This multi-way array is called tensor. Our approach deals with implementing three decomposition models LMLRA, BTD and CPD to the sample data for choosing the best decomposition of the data set. The results have proved that Block Term Decomposition (BTD) is the best tensor model for decomposing the hyper spectral image in to resultant factor matrices.