Sanjay Singh

CV
h-index16
21papers
16,368citations
Novelty38%
AI Score32

21 Papers

AIJul 31, 2024
The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri et al. · allen-ai, berkeley

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

CVAug 5, 2022
Convolutional Ensembling based Few-Shot Defect Detection Technique

Soumyajit Karmakar, Abeer Banerjee, Prashant Sadashiv Gidde et al.

Over the past few years, there has been a significant improvement in the domain of few-shot learning. This learning paradigm has shown promising results for the challenging problem of anomaly detection, where the general task is to deal with heavy class imbalance. Our paper presents a new approach to few-shot classification, where we employ the knowledge-base of multiple pre-trained convolutional models that act as the backbone for our proposed few-shot framework. Our framework uses a novel ensembling technique for boosting the accuracy while drastically decreasing the total parameter count, thus paving the way for real-time implementation. We perform an extensive hyperparameter search using a power-line defect detection dataset and obtain an accuracy of 92.30% for the 5-way 5-shot task. Without further tuning, we evaluate our model on competing standards with the existing state-of-the-art methods and outperform them.

CVAug 17, 2022
ParaColorizer: Realistic Image Colorization using Parallel Generative Networks

Himanshu Kumar, Abeer Banerjee, Sumeet Saurav et al.

Grayscale image colorization is a fascinating application of AI for information restoration. The inherently ill-posed nature of the problem makes it even more challenging since the outputs could be multi-modal. The learning-based methods currently in use produce acceptable results for straightforward cases but usually fail to restore the contextual information in the absence of clear figure-ground separation. Also, the images suffer from color bleeding and desaturated backgrounds since a single model trained on full image features is insufficient for learning the diverse data modes. To address these issues, we present a parallel GAN-based colorization framework. In our approach, each separately tailored GAN pipeline colorizes the foreground (using object-level features) or the background (using full-image features). The foreground pipeline employs a Residual-UNet with self-attention as its generator trained using the full-image features and the corresponding object-level features from the COCO dataset. The background pipeline relies on full-image features and additional training examples from the Places dataset. We design a DenseFuse-based fusion network to obtain the final colorized image by feature-based fusion of the parallelly generated outputs. We show the shortcomings of the non-perceptual evaluation metrics commonly used to assess multi-modal problems like image colorization and perform extensive performance evaluation of our framework using multiple perceptual metrics. Our approach outperforms most of the existing learning-based methods and produces results comparable to the state-of-the-art. Further, we performed a runtime analysis and obtained an average inference time of 24ms per image.

IVJul 9, 2024
Towards Physics-informed Cyclic Adversarial Multi-PSF Lensless Imaging

Abeer Banerjee, Sanjay Singh

Lensless imaging has emerged as a promising field within inverse imaging, offering compact, cost-effective solutions with the potential to revolutionize the computational camera market. By circumventing traditional optical components like lenses and mirrors, novel approaches like mask-based lensless imaging eliminate the need for conventional hardware. However, advancements in lensless image reconstruction, particularly those leveraging Generative Adversarial Networks (GANs), are hindered by the reliance on data-driven training processes, resulting in network specificity to the Point Spread Function (PSF) of the imaging system. This necessitates a complete retraining for minor PSF changes, limiting adaptability and generalizability across diverse imaging scenarios. In this paper, we introduce a novel approach to multi-PSF lensless imaging, employing a dual discriminator cyclic adversarial framework. We propose a unique generator architecture with a sparse convolutional PSF-aware auxiliary branch, coupled with a forward model integrated into the training loop to facilitate physics-informed learning to handle the substantial domain gap between lensless and lensed images. Comprehensive performance evaluation and ablation studies underscore the effectiveness of our model, offering robust and adaptable lensless image reconstruction capabilities. Our method achieves comparable performance to existing PSF-agnostic generative methods for single PSF cases and demonstrates resilience to PSF changes without the need for retraining.

CVJan 10, 2025Code
A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction

Naval Kishore Mehta, Arvind, Himanshu Kumar et al.

Detecting and interpreting operator actions, engagement, and object interactions in dynamic industrial workflows remains a significant challenge in human-robot collaboration research, especially within complex, real-world environments. Traditional unimodal methods often fall short of capturing the intricacies of these unstructured industrial settings. To address this gap, we present a novel Multimodal Industrial Activity Monitoring (MIAM) dataset that captures realistic assembly and disassembly tasks, facilitating the evaluation of key meta-tasks such as action localization, object interaction, and engagement prediction. The dataset comprises multi-view RGB, depth, and Inertial Measurement Unit (IMU) data collected from 22 sessions, amounting to 290 minutes of untrimmed video, annotated in detail for task performance and operator behavior. Its distinctiveness lies in the integration of multiple data modalities and its emphasis on real-world, untrimmed industrial workflows-key for advancing research in human-robot collaboration and operator monitoring. Additionally, we propose a multimodal network that fuses RGB frames, IMU data, and skeleton sequences to predict engagement levels during industrial tasks. Our approach improves the accuracy of recognizing engagement states, providing a robust solution for monitoring operator performance in dynamic industrial environments. The dataset and code can be accessed from https://github.com/navalkishoremehta95/MIAM/.

CVMar 5, 2024
Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks

Abeer Banerjee, Naval K. Mehta, Shyam S. Prasad et al.

In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding method seamlessly integrates Dynamic Vision Sensor (DVS) events with grayscale guide frames, generating consecutively encoded images for input into our neural network. This unique solution not only captures diverse gaze responses from participants within the active age group but also introduces a curated dataset tailored for low-light conditions. The encoded temporal frames paired with our network showcase impressive spatial localization and reliable gaze direction in their predictions. Achieving a remarkable 100-pixel accuracy of 100%, our research underscores the potency of our neural network to work with temporally consecutive encoded images for precise gaze vector predictions in challenging low-light videos, contributing to the advancement of gaze prediction technologies.

CVJan 9, 2025
Optimizing Multitask Industrial Processes with Predictive Action Guidance

Naval Kishore Mehta, Arvind, Shyam Sunder Prasad et al.

Monitoring complex assembly processes is critical for maintaining productivity and ensuring compliance with assembly standards. However, variability in human actions and subjective task preferences complicate accurate task anticipation and guidance. To address these challenges, we introduce the Multi-Modal Transformer Fusion and Recurrent Units (MMTFRU) Network for egocentric activity anticipation, utilizing multimodal fusion to improve prediction accuracy. Integrated with the Operator Action Monitoring Unit (OAMU), the system provides proactive operator guidance, preventing deviations in the assembly process. OAMU employs two strategies: (1) Top-5 MMTF-RU predictions, combined with a reference graph and an action dictionary, for next-step recommendations; and (2) Top-1 MMTF-RU predictions, integrated with a reference graph, for detecting sequence deviations and predicting anomaly scores via an entropy-informed confidence mechanism. We also introduce Time-Weighted Sequence Accuracy (TWSA) to evaluate operator efficiency and ensure timely task completion. Our approach is validated on the industrial Meccano dataset and the largescale EPIC-Kitchens-55 dataset, demonstrating its effectiveness in dynamic environments.

IVNov 27, 2024
Towards Lensless Image Deblurring with Prior-Embedded Implicit Neural Representations in the Low-Data Regime

Abeer Banerjee, Sanjay Singh

The field of computational imaging has witnessed a promising paradigm shift with the emergence of untrained neural networks, offering novel solutions to inverse computational imaging problems. While existing techniques have demonstrated impressive results, they often operate either in the high-data regime, leveraging Generative Adversarial Networks (GANs) as image priors, or through untrained iterative reconstruction in a data-agnostic manner. This paper delves into lensless image reconstruction, a subset of computational imaging that replaces traditional lenses with computation, enabling the development of ultra-thin and lightweight imaging systems. To the best of our knowledge, we are the first to leverage implicit neural representations for lensless image deblurring, achieving reconstructions without the requirement of prior training. We perform prior-embedded untrained iterative optimization to enhance reconstruction performance and speed up convergence, effectively bridging the gap between the no-data and high-data regimes. Through a thorough comparative analysis encompassing various untrained and low-shot methods, including under-parameterized non-convolutional methods and domain-restricted low-shot methods, we showcase the superior performance of our approach by a significant margin.

CLOct 24, 2024
Supporting Assessment of Novelty of Design Problems Using Concept of Problem SAPPhIRE

Sanjay Singh, Amaresh Chakrabarti

This paper proposes a framework for assessing the novelty of design problems using the SAPPhIRE model of causality. The novelty of a problem is measured as its minimum distance from the problems in a reference problem database. The distance is calculated by comparing the current problem and each reference past problem at the various levels of abstraction in the SAPPhIRE ontology. The basis for comparison is textual similarity. To demonstrate the applicability of the proposed framework, The current set of problems associated with an artifact, as collected from its stakeholders, were compared with the past set of problems, as collected from patents and other web sources, to assess the novelty of the current set. This approach is aimed at providing a better understanding of the degree of novelty of any given set of current problems by comparing them to similar problems available from historical records. Since manual assessment, the current mode of such assessments as reported in the literature, is a tedious process, to reduce time complexity and to afford better applicability for larger sets of problem statements, an automated assessment is proposed and used in this paper.

ROJun 27, 2024
Autonomous Control of a Novel Closed Chain Five Bar Active Suspension via Deep Reinforcement Learning

Nishesh Singh, Sidharth Ramesh, Abhishek Shankar et al.

Planetary exploration requires traversal in environments with rugged terrains. In addition, Mars rovers and other planetary exploration robots often carry sensitive scientific experiments and components onboard, which must be protected from mechanical harm. This paper deals with an active suspension system focused on chassis stabilisation and an efficient traversal method while encountering unavoidable obstacles. Soft Actor-Critic (SAC) was applied along with Proportional Integral Derivative (PID) control to stabilise the chassis and traverse large obstacles at low speeds. The model uses the rover's distance from surrounding obstacles, the height of the obstacle, and the chassis' orientation to actuate the control links of the suspension accurately. Simulations carried out in the Gazebo environment are used to validate the proposed active system.

CLJun 16, 2021
Evaluating Gender Bias in Hindi-English Machine Translation

Gauri Gupta, Krithika Ramesh, Sanjay Singh

With language models being deployed increasingly in the real world, it is essential to address the issue of the fairness of their outputs. The word embedding representations of these language models often implicitly draw unwanted associations that form a social bias within the model. The nature of gendered languages like Hindi, poses an additional problem to the quantification and mitigation of bias, owing to the change in the form of the words in the sentence, based on the gender of the subject. Additionally, there is sparse work done in the realm of measuring and debiasing systems for Indic languages. In our work, we attempt to evaluate and quantify the gender bias within a Hindi-English machine translation system. We implement a modified version of the existing TGBI metric based on the grammatical considerations for Hindi. We also compare and contrast the resulting bias measurements across multiple metrics for pre-trained embeddings and the ones learned by our machine translation model.

CVJan 10, 2020
Compressive sensing based privacy for fall detection

Ronak Gupta, Prashant Anand, Santanu Chaudhury et al.

Fall detection holds immense importance in the field of healthcare, where timely detection allows for instant medical assistance. In this context, we propose a 3D ConvNet architecture which consists of 3D Inception modules for fall detection. The proposed architecture is a custom version of Inflated 3D (I3D) architecture, that takes compressed measurements of video sequence as spatio-temporal input, obtained from compressive sensing framework, rather than video sequence as input, as in the case of I3D convolutional neural network. This is adopted since privacy raises a huge concern for patients being monitored through these RGB cameras. The proposed framework for fall detection is flexible enough with respect to a wide variety of measurement matrices. Ten action classes randomly selected from Kinetics-400 with no fall examples, are employed to train our 3D ConvNet post compressive sensing with different types of sensing matrices on the original video clips. Our results show that 3D ConvNet performance remains unchanged with different sensing matrices. Also, the performance obtained with Kinetics pre-trained 3D ConvNet on compressively sensed fall videos from benchmark datasets is better than the state-of-the-art techniques.

SIAug 14, 2015
Is Stack Overflow Overflowing With Questions and Tags

Ranjitha R. K., Sanjay Singh

Programming question and answer (Q & A) websites, such as Quora, Stack Overflow, and Yahoo! Answer etc. helps us to understand the programming concepts easily and quickly in a way that has been tested and applied by many software developers. Stack Overflow is one of the most frequently used programming Q\&A website where the questions and answers posted are presently analyzed manually, which requires a huge amount of time and resource. To save the effort, we present a topic modeling based technique to analyze the words of the original texts to discover the themes that run through them. We also propose a method to automate the process of reviewing the quality of questions on Stack Overflow dataset in order to avoid ballooning the stack overflow with insignificant questions. The proposed method also recommends the appropriate tags for the new post, which averts the creation of unnecessary tags on Stack Overflow.

IRMar 23, 2014
A Novel Method to Calculate Click Through Rate for Sponsored Search

Rahul Gupta, Gitansh Khirbat, Sanjay Singh

Sponsored search adopts generalized second price (GSP) auction mechanism which works on the concept of pay per click which is most commonly used for the allocation of slots in the searched page. Two main aspects associated with GSP are the bidding amount and the click through rate (CTR). The CTR learning algorithms currently being used works on the basic principle of (#clicks_i/ #impressions_i) under a fixed window of clicks or impressions or time. CTR are prone to fraudulent clicks, resulting in sudden increase of CTR. The current algorithms are unable to find the solutions to stop this, although with the use of machine learning algorithms it can be detected that fraudulent clicks are being generated. In our paper, we have used the concept of relative ranking which works on the basic principle of (#clicks_i /#clicks_t). In this algorithm, both the numerator and the denominator are linked. As #clicks_t is higher than previous algorithms and is linked to the #clicks_i, the small change in the clicks which occurs in the normal scenario have a very small change in the result but in case of fraudulent clicks the number of clicks increases or decreases rapidly which will add up with the normal clicks to increase the denominator, thereby decreasing the CTR.

IRJan 11, 2014
Design and Development of a User Specific Dynamic E-Magazine

Vikram Santhalia, Sanjay Singh

Internet and electronic media gaining more popularity due to ease and speed, the count of Internet users has increased tremendously. The world is moving faster each day with several events taking place at once and the Internet is flooded with information in every field. There are categories of information ranging from most relevant to user, to the information totally irrelevant or less relevant to specific users. In such a scenario getting the information which is most relevant to the user is indispensable to save time. The motivation of our solution is based on the idea of optimizing the search for information automatically. This information is delivered to user in the form of an interactive GUI. The optimization of the contents or information served to him is based on his social networking profiles and on his reading habits on the proposed solution. The aim is to get the user's profile information based on his social networking profile considering that almost every Internet user has one. This helps us personalize the contents delivered to the user in order to produce what is most relevant to him, in the form of a personalized e-magazine. Further the proposed solution learns user's reading habits for example the news he saves or clicks the most and makes a decision to provide him with the best contents.

SIAug 18, 2013
Detection and Filtering of Collaborative Malicious Users in Reputation System using Quality Repository Approach

Jnanamurthy HK, Sanjay Singh

Online reputation system is gaining popularity as it helps a user to be sure about the quality of a product/service he wants to buy. Nonetheless online reputation system is not immune from attack. Dealing with malicious ratings in reputation systems has been recognized as an important but difficult task. This problem is challenging when the number of true user's ratings is relatively small and unfair ratings plays majority in rated values. In this paper, we have proposed a new method to find malicious users in online reputation systems using Quality Repository Approach (QRA). We mainly concentrated on anomaly detection in both rating values and the malicious users. QRA is very efficient to detect malicious user ratings and aggregate true ratings. The proposed reputation system has been evaluated through simulations and it is concluded that the QRA based system significantly reduces the impact of unfair ratings and improve trust on reputation score with lower false positive as compared to other method used for the purpose.

AISep 19, 2012
Modeling and Verification of a Multi-Agent Argumentation System using NuSMV

Supriya D'Souza, Abhishek Rao, Amit Sharma et al.

Autonomous intelligent agent research is a domain situated at the forefront of artificial intelligence. Interest-based negotiation (IBN) is a form of negotiation in which agents exchange information about their underlying goals, with a view to improve the likelihood and quality of a offer. In this paper we model and verify a multi-agent argumentation scenario of resource sharing mechanism to enable resource sharing in a distributed system. We use IBN in our model wherein agents express their interests to the others in the society to gain certain resources.

CRSep 11, 2012
Two Way Concurrent Buffer System without Deadlock in Various Time Models Using Timed Automata

Rohit Mishra, Md Zeeshan, Sanjay Singh

Two way buffer system is a system that exhibits transfer of data using two buffers concurrently. It includes processes that synchronize to exchange data with each other along with executing certain delays between these synchronizations. In existing Tiny Two Way Buffer System, both generators produce packets in half duplex manner in no time, deterministic time, and non deterministic time. Analysis of the model for above time options leads the model in deadlock. The model can be out of the deadlock if timings in the model is incorporated in alternative fashion. The generators produce packets after a delay of 10 seconds. However, generator one has an initial shift of 5 seconds after which it begins sending a packet every 10 seconds. Hence, initial delay for generator one is 15 seconds and for generator two it is 10 seconds. Due to this initial shift, both generators produce packets alternatively and is deadlock free as the packets do not meet at the same time instant. Moreover, the existing system model is not concurrent and hence takes more time for packet transfer in every iteration. In this paper we have proposed a model of buffer system using an additional dummy buffer for transfer of data packets, we thus checking the model with various time models as no time, deterministic time and non deterministic time. The results of proposed model under above time models are in deadlock. We achieve deadlock free situation by introducing appropriate delay in various buffers of the proposed system, the delay timing is nondeterministic time. The new approach speeds up the transfer of packets, as a result the transfer of data becomes concurrent, deadlock free and hence the model proposed is time efficient. Simulation results shows that the proposed two way buffer system is fully concurrent and time efficient as compared to the existing buffer system.

SEAug 16, 2012
Modeling and Verification of Agent based Adaptive Traffic Signal using Symbolic Model Verifier

Vivek Vishal, Sagar Gugwad, Sanjay Singh

This paper addresses the issue of modeling and verification of a Multi Agent System (MAS) scenario. We have considered an agent based adaptive traffic signal system. The system monitors the smooth flow of traffic at intersection of two road segment. After describing how the adaptive traffic signal system can efficiently be used and showing its advantages over traffic signals with predetermined periods, we have shown how we can transform this scenario into Finite State Machine (FSM). Once the system is transformed into a FSM, we have verified the specifications specified in Computational Tree Logic(CTL) using NuSMV as a model checking tool. Simulation results obtained from NuSMV showed us whether the system satisfied the specifications or not. It has also showed us the state where the system specification does not hold. Using which we traced back our system to find the source, leading to the specification violation. Finally, we again verified the modified system with NuSMV for its specifications.

CRAug 8, 2012
Ownership Authentication Transfer Protocol for Ubiquitous Computing Devices

radeep B. H, Sanjay Singh

In ubiquitous computing devices, users tend to store some valuable information in their device. Even though the device can be borrowed by the other user temporarily, it is not safe for any user to borrow or lend the device as it may result the private data of the user to be public. To safeguard the user data and also to preserve user privacy we propose the technique of ownership authentication transfer. The user who is willing to sell the device has to transfer the ownership of the device under sale. Once the device is sold and the ownership has been transferred, the old owner will not be able to use that device at any cost. Either of the users will not be able to use the device if the process of ownership has not been carried out properly. This also takes care of the scenario when the device has been stolen or lost, avoiding the impersonation attack. The proposed protocol has been modeled and verified using Automated Validation of Internet Security Protocols and Applications (AVISPA) and is found to be safe.

CRJun 5, 2012
Privacy Preserving and Ownership Authentication in Ubiquitous Computing Devices using Secure Three Way Authentication

Pradeep B. H., Sanjay Singh

In todays world of technology and gadgets almost every person is having a portable device, be it a laptop or the smart phones. The user would like to have all the services at his fingertips and access them through the portable device he owns. Maybe he wants some data from the fellow user or from the service provider or maybe he wants to control his smart devices at home from wherever he is. In the present era of mobile environments, interactions between the user device and the service provider must be secure enough regardless of the type of device used to access or utilize the services. In this paper we propose a "Secure Three Way Authentication (STWA)" technique intended to preserve the user privacy and to accomplish ownership authentication in order to securely deliver the services to the user devices. This technique will also help the users or the service providers to check if the device is compromised or not with the help of the encrypted pass-phrases that are being exchanged.