Anubhav Gupta

CV
h-index44
12papers
193citations
Novelty43%
AI Score42

12 Papers

CVSep 10, 2024
LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation

Archana Swaminathan, Anubhav Gupta, Kamal Gupta et al.

Neural Radiance Fields (NeRFs) have revolutionized the reconstruction of static scenes and objects in 3D, offering unprecedented quality. However, extending NeRFs to model dynamic objects or object articulations remains a challenging problem. Previous works have tackled this issue by focusing on part-level reconstruction and motion estimation for objects, but they often rely on heuristics regarding the number of moving parts or object categories, which can limit their practical use. In this work, we introduce LEIA, a novel approach for representing dynamic 3D objects. Our method involves observing the object at distinct time steps or "states" and conditioning a hypernetwork on the current state, using this to parameterize our NeRF. This approach allows us to learn a view-invariant latent representation for each state. We further demonstrate that by interpolating between these states, we can generate novel articulation configurations in 3D space that were previously unseen. Our experimental results highlight the effectiveness of our method in articulating objects in a manner that is independent of the viewing angle and joint configuration. Notably, our approach outperforms previous methods that rely on motion information for articulation registration.

CVAug 5, 2024
Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics

Shishira R Maiya, Anubhav Gupta, Matthew Gwilliam et al.

Implicit Neural Networks (INRs) have emerged as powerful representations to encode all forms of data, including images, videos, audios, and scenes. With video, many INRs for video have been proposed for the compression task, and recent methods feature significant improvements with respect to encoding time, storage, and reconstruction quality. However, these encoded representations lack semantic meaning, so they cannot be used for any downstream tasks that require such properties, such as retrieval. This can act as a barrier for adoption of video INRs over traditional codecs as they do not offer any significant edge apart from compression. To alleviate this, we propose a flexible framework that decouples the spatial and temporal aspects of the video INR. We accomplish this with a dictionary of per-frame latents that are learned jointly with a set of video specific hypernetworks, such that given a latent, these hypernetworks can predict the INR weights to reconstruct the given frame. This framework not only retains the compression efficiency, but the learned latents can be aligned with features from large vision models, which grants them discriminative properties. We align these latents with CLIP and show good performance for both compression and video retrieval tasks. By aligning with VideoLlama, we are able to perform open-ended chat with our learned latents as the visual inputs. Additionally, the learned latents serve as a proxy for the underlying weights, allowing us perform tasks like video interpolation. These semantic properties and applications, existing simultaneously with ability to perform compression, interpolation, and superresolution properties, are a first in this field of work.

14.1SYMay 12Code
Basilisk and Docker for Reproducible GN&C Simulation: A Workflow Reference

Anubhav Gupta

Basilisk is an open-source astrodynamics simulation framework widely used for spacecraft guidance, navigation, and control (GN&C) research and development. Despite its flexibility and computational capabilities, configuring Basilisk consistently across heterogeneous development environments presents practical challenges due to dependency management, operating system compatibility, and software configuration requirements. This paper presents a Docker-based containerization workflow for Basilisk that encapsulates the complete build environment, dependencies, and simulation infrastructure within a portable container image. The workflow is demonstrated through a progression of simulation scenarios of increasing complexity, from standalone orbital dynamics scripts to BSKSim-based attitude dynamics and control simulations with Monte Carlo analysis. The BSKSim class hierarchy, dynamics model architecture, flight software implementation, and scenario execution patterns are described in detail. The presented workflow provides a self-contained implementation reference for GN&C engineers and researchers seeking reproducible and portable Basilisk simulation environments. This work expands upon a workshop presentation delivered at the 46th Rocky Mountain AAS GN&C Conference, February 2024, available at https://doi.org/10.5281/zenodo.15008785.

CVApr 1, 2024Code
Measuring Style Similarity in Diffusion Models

Gowthami Somepalli, Anubhav Gupta, Kamal Gupta et al. · microsoft-research

Generative models are now widely used by graphic designers and artists. Prior works have shown that these models remember and often replicate content from their training data during generation. Hence as their proliferation increases, it has become important to perform a database search to determine whether the properties of the image are attributable to specific training data, every time before a generated image is used for professional purposes. Existing tools for this purpose focus on retrieving images of similar semantic content. Meanwhile, many artists are concerned with style replication in text-to-image models. We present a framework for understanding and extracting style descriptors from images. Our framework comprises a new dataset curated using the insight that style is a subjective property of an image that captures complex yet meaningful interactions of factors including but not limited to colors, textures, shapes, etc. We also propose a method to extract style descriptors that can be used to attribute style of a generated image to the images used in the training dataset of a text-to-image model. We showcase promising results in various style retrieval tasks. We also quantitatively and qualitatively analyze style attribution and matching in the Stable Diffusion model. Code and artifacts are available at https://github.com/learn2phoenix/CSD.

IVJul 20, 2024
MedMAE: A Self-Supervised Backbone for Medical Imaging Tasks

Anubhav Gupta, Islam Osman, Mohamed S. Shehata et al.

Medical imaging tasks are very challenging due to the lack of publicly available labeled datasets. Hence, it is difficult to achieve high performance with existing deep-learning models as they require a massive labeled dataset to be trained effectively. An alternative solution is to use pre-trained models and fine-tune them using the medical imaging dataset. However, all existing models are pre-trained using natural images, which is a completely different domain from that of medical imaging, which leads to poor performance due to domain shift. To overcome these problems, we propose a large-scale unlabeled dataset of medical images and a backbone pre-trained using the proposed dataset with a self-supervised learning technique called Masked autoencoder. This backbone can be used as a pre-trained model for any medical imaging task, as it is trained to learn a visual representation of different types of medical images. To evaluate the performance of the proposed backbone, we used four different medical imaging tasks. The results are compared with existing pre-trained models. These experiments show the superiority of our proposed backbone in medical imaging tasks.

CVNov 2, 2021Code
PatchGame: Learning to Signal Mid-level Patches in Referential Games

Kamal Gupta, Gowthami Somepalli, Anubhav Gupta et al.

We study a referential game (a type of signaling game) where two agents communicate with each other via a discrete bottleneck to achieve a common goal. In our referential game, the goal of the speaker is to compose a message or a symbolic representation of "important" image patches, while the task for the listener is to match the speaker's message to a different view of the same image. We show that it is indeed possible for the two agents to develop a communication protocol without explicit or implicit supervision. We further investigate the developed protocol and show the applications in speeding up recent Vision Transformers by using only important patches, and as pre-training for downstream recognition tasks (e.g., classification). Code available at https://github.com/kampta/PatchGame.

ROMay 9, 2025
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations

Shuaiyi Huang, Mara Levy, Anubhav Gupta et al.

Preference feedback collected by human or VLM annotators is often noisy, presenting a significant challenge for preference-based reinforcement learning that relies on accurate preference labels. To address this challenge, we propose TREND, a novel framework that integrates few-shot expert demonstrations with a tri-teaching strategy for effective noise mitigation. Our method trains three reward models simultaneously, where each model views its small-loss preference pairs as useful knowledge and teaches such useful pairs to its peer network for updating the parameters. Remarkably, our approach requires as few as one to three expert demonstrations to achieve high performance. We evaluate TREND on various robotic manipulation tasks, achieving up to 90% success rates even with noise levels as high as 40%, highlighting its effective robustness in handling noisy preference feedback. Project page: https://shuaiyihuang.github.io/publications/TREND.

SEOct 25, 2021
Assuring Increasingly Autonomous Systems in Human-Machine Teams: An Urban Air Mobility Case Study

Siddhartha Bhattacharyya, Jennifer Davis, Anubhav Gupta et al.

As aircraft systems become increasingly autonomous, the human-machine role allocation changes and opportunities for new failure modes arise. This necessitates an approach to identify the safety requirements for the increasingly autonomous system (IAS) as well as a framework and techniques to verify and validate that an IAS meets its safety requirements. We use Crew Resource Management techniques to identify requirements and behaviors for safe human-machine teaming behaviors. We provide a methodology to verify that an IAS meets its requirements. We apply the methodology to a case study in Urban Air Mobility, which includes two contingency scenarios: unreliable sensor and aborted landing. For this case study, we implement an IAS agent in the Soar language that acts as a copilot for the selected contingency scenarios and performs takeoff and landing preparation, while the pilot maintains final decision authority. We develop a formal human-machine team architecture model in the Architectural Analysis and Design Language (AADL), with operator and IAS requirements formalized in the Assume Guarantee REasoning Environment (AGREE) Annex to AADL. We formally verify safety requirements for the human-machine team given the requirements on the IAS and operator. We develop an automated translator from Soar to the nuXmv model checking language and formally verify that the IAS agent satisfies its requirements using nuXmv. We share the design and requirements errors found in the process as well as our lessons learned.

CROct 8, 2021
A Wireless Intrusion Detection System for 802.11 WPA3 Networks

Neil Dalal, Nadeem Akhtar, Anubhav Gupta et al.

Wi-Fi (802.11) networks have become an essential part of our daily lives; hence, their security is of utmost importance. However, Wi-Fi Protected Access 3 (WPA3), the latest security certification for 802.11 standards, has recently been shown to be vulnerable to several attacks. In this paper, we first describe the attacks on WPA3 networks that have been reported in prior work; additionally, we show that a deauthentication attack and a beacon flood attack, known to be possible on a WPA2 network, are still possible with WPA3. We launch and test all the above (a total of nine) attacks using a testbed that contains an enterprise Access Point (AP) and Intrusion Detection System (IDS). Our experimental results show that the AP is vulnerable to eight out of the nine attacks and the IDS is unable to detect any of them. We propose a design for a signature-based IDS, which incorporates techniques to detect all the above attacks. Also, we implement these techniques on our testbed and verify that our IDS is able to successfully detect all the above attacks. We provide schemes for mitigating the impact of the above attacks once they are detected. We make the code to perform the above attacks as well as that of our IDS publicly available, so that it can be used for future work by the research community at large.

LGSep 9, 2021
Mining Points of Interest via Address Embeddings: An Unsupervised Approach

Abhinav Ganesan, Anubhav Gupta, Jose Mathew

Digital maps are commonly used across the globe for exploring places that users are interested in, commonly referred to as points of interest (PoI). In online food delivery platforms, PoIs could represent any major private compounds where customers could order from such as hospitals, residential complexes, office complexes, educational institutes and hostels. In this work, we propose an end-to-end unsupervised system design for obtaining polygon representations of PoIs (PoI polygons) from address locations and address texts. We preprocess the address texts using locality names and generate embeddings for the address texts using a deep learning-based architecture, viz. RoBERTa, trained on our internal address dataset. The PoI candidates are identified by jointly clustering the anonymised customer phone GPS locations (obtained during address onboarding) and the embeddings of the address texts. The final list of PoI polygons is obtained from these PoI candidates using novel post-processing steps. This algorithm identified 74.8 % more PoIs than those obtained using the Mummidi-Krumm baseline algorithm run on our internal dataset. The proposed algorithm achieves a median area precision of 98 %, a median area recall of 8 %, and a median F-score of 0.15. In order to improve the recall of the algorithmic polygons, we post-process them using building footprint polygons from the OpenStreetMap (OSM) database. The post-processing algorithm involves reshaping the algorithmic polygon using intersecting polygons and closed private roads from the OSM database, and accounting for intersection with public roads on the OSM database. We achieve a median area recall of 70 %, a median area precision of 69 %, and a median F-score of 0.69 on these post-processed polygons.

SIDec 27, 2016
Finding Influential Institutions in Bibliographic Information Networks

Anubhav Gupta, M. Narasimha Murty

Ranking in bibliographic information networks is a widely studied problem due to its many applications such as advertisement industry, funding, search engines, etc. Most of the existing works on ranking in bibliographic information network are based on ranking of research papers and their authors. But the bibliographic information network can be used for solving other important problems as well. The KDD Cup $2016$ competition considers one such problem, which is to measure the impact of research institutions, i.e. to perform ranking of research institutions. The competition took place in three phases. In this paper, we discuss our solutions for ranking institutions in each phase. We participated under team name "anu@TASL" and our solutions achieved the average NDCG@$20$ score of $0.7483$, ranking in eleventh place in the contest.

LGApr 8, 2015
Data Mining for Prediction of Human Performance Capability in the Software-Industry

Gaurav Singh Thakur, Anubhav Gupta, Sangita Gupta

The recruitment of new personnel is one of the most essential business processes which affect the quality of human capital within any company. It is highly essential for the companies to ensure the recruitment of right talent to maintain a competitive edge over the others in the market. However IT companies often face a problem while recruiting new people for their ongoing projects due to lack of a proper framework that defines a criteria for the selection process. In this paper we aim to develop a framework that would allow any project manager to take the right decision for selecting new talent by correlating performance parameters with the other domain-specific attributes of the candidates. Also, another important motivation behind this project is to check the validity of the selection procedure often followed by various big companies in both public and private sectors which focus only on academic scores, GPA/grades of students from colleges and other academic backgrounds. We test if such a decision will produce optimal results in the industry or is there a need for change that offers a more holistic approach to recruitment of new talent in the software companies. The scope of this work extends beyond the IT domain and a similar procedure can be adopted to develop a recruitment framework in other fields as well. Data-mining techniques provide useful information from the historical projects depending on which the hiring-manager can make decisions for recruiting high-quality workforce. This study aims to bridge this hiatus by developing a data-mining framework based on an ensemble-learning technique to refocus on the criteria for personnel selection. The results from this research clearly demonstrated that there is a need to refocus on the selection-criteria for quality objectives.