Shivam Agarwal

h-index12

6papers

268citations

Novelty50%

AI Score37

Ranked #92,233 of 194,257 authors (top 47%)#17,312 in CL (top 56%)

6 Papers

33.6CLOct 23, 2024Code

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Shansan Gong, Shivam Agarwal, Yizhe Zhang et al.

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to their AR counterparts and lack fair comparison on language modeling benchmarks. Additionally, training diffusion models from scratch at scale remains challenging. Given the prevalence of open-source AR language models, we propose adapting these models to build text diffusion models. We demonstrate connections between AR and diffusion modeling objectives and introduce a simple continual pre-training approach for training diffusion models. Through systematic evaluation on language modeling, reasoning, and commonsense benchmarks, we show that we can convert AR models ranging from 127M to 7B parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts. We release a suite of DLMs (127M-355M-7B) capable of generating fluent text, performing in-context learning, filling in the middle without prompt re-ordering, and following instructions https://github.com/HKUNLP/DiffuLLaMA.

3.3CLMay 24, 2023Code

Text-Augmented Open Knowledge Graph Completion via Pre-Trained Language Models

Pengcheng Jiang, Shivam Agarwal, Bowen Jin et al.

The mission of open knowledge graph (KG) completion is to draw new findings from known facts. Existing works that augment KG completion require either (1) factual triples to enlarge the graph reasoning space or (2) manually designed prompts to extract knowledge from a pre-trained language model (PLM), exhibiting limited performance and requiring expensive efforts from experts. To this end, we propose TAGREAL that automatically generates quality query prompts and retrieves support information from large text corpora to probe knowledge from PLM for KG completion. The results show that TAGREAL achieves state-of-the-art performance on two benchmark datasets. We find that TAGREAL has superb performance even with limited training data, outperforming existing embedding-based, graph-based, and PLM-based methods.

6.4HCSep 20, 2021

Visually Connecting Historical Figures Through Event Knowledge Graphs

Shahid Latif, Shivam Agarwal, Simon Gottschalk et al.

Knowledge graphs store information about historical figures and their relationships indirectly through shared events. We developed a visualization system, VisKonnect, for analyzing the intertwined lives of historical figures based on the events they participated in. A user's query is parsed for identifying named entities, and related data is retrieved from an event knowledge graph. While a short textual answer to the query is generated using the GPT-3 language model, various linked visualizations provide context, display additional information related to the query, and allow exploration.

13.6CRJan 28, 2021

S++: A Fast and Deployable Secure-Computation Framework for Privacy-Preserving Neural Network Training

Prashanthi Ramachandran, Shivam Agarwal, Arup Mondal et al.

We introduce S++, a simple, robust, and deployable framework for training a neural network (NN) using private data from multiple sources, using secret-shared secure function evaluation. In short, consider a virtual third party to whom every data-holder sends their inputs, and which computes the neural network: in our case, this virtual third party is actually a set of servers which individually learn nothing, even with a malicious (but non-colluding) adversary. Previous work in this area has been limited to just one specific activation function: ReLU, rendering the approach impractical for many use-cases. For the first time, we provide fast and verifiable protocols for all common activation functions and optimize them for running in a secret-shared manner. The ability to quickly, verifiably, and robustly compute exponentiation, softmax, sigmoid, etc., allows us to use previously written NNs without modification, vastly reducing developer effort and complexity of code. In recent times, ReLU has been found to converge much faster and be more computationally efficient as compared to non-linear functions like sigmoid or tanh. However, we argue that it would be remiss not to extend the mechanism to non-linear functions such as the logistic sigmoid, tanh, and softmax that are fundamental due to their ability to express outputs as probabilities and their universal approximation property. Their contribution in RNNs and a few recent advancements also makes them more relevant.

3.3HCSep 1, 2020

How Visualization PhD Students Cope with Paper Rejections

Shivam Agarwal, Shahid Latif, Fabian Beck

We conducted a questionnaire study aimed towards PhD students in the field of visualization research to understand how they cope with paper rejections. We collected responses from 24 participants and performed a qualitative analysis of the data in relation to the provided support by collaborators, resubmission strategies, handling multiple rejects, and personal impression of the reviews. The results indicate that the PhD students in the visualization community generally cope well with the negative reviews and, with experience, learn how to act accordingly to improve and resubmit their work. Our results reveal the main coping strategies that can be applied for constructively handling rejected visualization papers. The most prominent strategies include: discussing reviews with collaborators and making a resubmission plan, doing a major revision to improve the work, shortening the work, and seeing rejection as a positive learning experience.

2.3MMMar 30, 2020Code

Deep Residual Neural Networks for Image in Speech Steganography

Shivam Agarwal, Siddarth Venkatraman

Steganography is the art of hiding a secret message inside a publicly visible carrier message. Ideally, it is done without modifying the carrier, and with minimal loss of information in the secret message. Recently, various deep learning based approaches to steganography have been applied to different message types. We propose a deep learning based technique to hide a source RGB image message inside finite length speech segments without perceptual loss. To achieve this, we train three neural networks; an encoding network to hide the message in the carrier, a decoding network to reconstruct the message from the carrier and an additional image enhancer network to further improve the reconstructed message. We also discuss future improvements to the algorithm proposed.