IVSep 20, 2024Code
Multiscale Encoder and Omni-Dimensional Dynamic Convolution Enrichment in nnU-Net for Brain Tumor SegmentationSahaj K. Mistry, Sourav Saini, Aashray Gupta et al.
Brain tumor segmentation plays a crucial role in computer-aided diagnosis. This study introduces a novel segmentation algorithm utilizing a modified nnU-Net architecture. Within the nnU-Net architecture's encoder section, we enhance conventional convolution layers by incorporating omni-dimensional dynamic convolution layers, resulting in improved feature representation. Simultaneously, we propose a multi-scale attention strategy that harnesses contemporary insights from various scales. Our model's efficacy is demonstrated on diverse datasets from the BraTS-2023 challenge. Integrating omni-dimensional dynamic convolution (ODConv) layers and multi-scale features yields substantial improvement in the nnU-Net architecture's performance across multiple tumor segmentation datasets. Remarkably, our proposed model attains good accuracy during validation for the BraTS Africa dataset. The ODconv source code along with full training code is available on GitHub.
25.1ROMay 23
Vision-Guided Outdoor Flight and Obstacle Evasion via Reinforcement LearningShiladitya Dutta, Aayush Gupta, Varun Saran et al.
Although quadcopters boast impressive traversal capabilities enabled by their omnidirectional maneuverability, the need for continuous pilot control in complex environments impedes their application in GNSS and telemetry-denied scenarios. To this end, we propose a novel sensorimotor policy that uses stereo-vision depth and visual-inertial odometry (VIO) to autonomously navigate through obstacles in an unknown environment to reach a goal point. The policy is comprised of a pre-trained autoencoder as the perception head followed by a planning and control LSTM network which outputs velocity commands that can be followed by an off-the-shelf commercial drone. We leverage reinforcement and privileged learning paradigms to train the policy in simulation through a two-stage process: 1) initial training with optimal trajectories generated by a global motion planner acting as a supervisory backbone, 2) further fine-tuning in a curriculum environment. To bridge the sim-to-real gap, we employ domain randomization and reward shaping to create a policy that is both robust to noise and domain shift. In outdoor experiments, our approach achieves successful zero-shot transfer to both obstacle environments and a drone platform that were never encountered during training.
CVMar 14, 2025Code
Provenance Detection for AI-Generated Images: Combining Perceptual Hashing, Homomorphic Encryption, and AI Detection ModelsShree Singhi, Aayan Yadav, Aayush Gupta et al.
As AI-generated sensitive images become more prevalent, identifying their source is crucial for distinguishing them from real images. Conventional image watermarking methods are vulnerable to common transformations like filters, lossy compression, and screenshots, often applied during social media sharing. Watermarks can also be faked or removed if models are open-sourced or leaked since images can be rewatermarked. We have developed a three-part framework for secure, transformation-resilient AI content provenance detection, to address these limitations. We develop an adversarially robust state-of-the-art perceptual hashing model, DinoHash, derived from DINOV2, which is robust to common transformations like filters, compression, and crops. Additionally, we integrate a Multi-Party Fully Homomorphic Encryption~(MP-FHE) scheme into our proposed framework to ensure the protection of both user queries and registry privacy. Furthermore, we improve previous work on AI-generated media detection. This approach is useful in cases where the content is absent from our registry. DinoHash significantly improves average bit accuracy by 12% over state-of-the-art watermarking and perceptual hashing methods while maintaining superior true positive rate (TPR) and false positive rate (FPR) tradeoffs across various transformations. Our AI-generated media detection results show a 25% improvement in classification accuracy on commonly used real-world AI image generators over existing algorithms. By combining perceptual hashing, MP-FHE, and an AI content detection model, our proposed framework provides better robustness and privacy compared to previous work.
AISep 27, 2025
Fact Grounded Attention: Eliminating Hallucination in Large Language Models Through Attention Level Knowledge IntegrationAayush Gupta
"The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge." Large Language Models have conquered natural language but remain prisoners of their own probabilistic nature--confidently hallucinating facts they never truly knew. We present Fact Grounded Attention (FGA), a novel architectural modification that transforms unreliable language models into deterministic truth tellers by injecting verifiable knowledge directly into the attention mechanism. Unlike existing approaches that patch hallucinations after generation or prepend retrieved text, FGA intervenes at the mathematical heart of the transformer--the pre-softmax attention scores--creating a model that cannot hallucinate when facts exist in its knowledge base. Our experiments across 1,107 technical queries spanning smartphones, laptops, and electric vehicles demonstrate a transformation from 6.3% accuracy in vanilla Llama 3.2 to 99.7% accuracy with FGA. More critically, knowledge updates occur in under one second without retraining, compared to hours for parameter editing approaches. FGA doesn't just reduce hallucination--it eliminates it entirely for verifiable facts, marking a fundamental shift from probabilistic approximation to deterministic precision in neural language generation.
LGAug 17, 2025
Cold-RL: Learning Cache Eviction with Offline Reinforcement Learning for NGINXAayush Gupta, Arpit Bhayani
Web proxies such as NGINX commonly rely on least-recently-used (LRU) eviction, which is size agnostic and can thrash under periodic bursts and mixed object sizes. We introduce Cold-RL, a learned eviction policy for NGINX that replaces LRU's forced-expire path with a dueling Deep Q-Network served by an ONNX sidecar within a strict microsecond budget. On each eviction, Cold-RL samples the K least-recently-used objects, extracts six lightweight features (age, size, hit count, inter-arrival time, remaining TTL, and last origin RTT), and requests a bitmask of victims; a hard timeout of 500 microseconds triggers immediate fallback to native LRU. Policies are trained offline by replaying NGINX access logs through a cache simulator with a simple reward: a retained object earns one point if it is hit again before TTL expiry. We compare against LRU, LFU, size-based, adaptive LRU, and a hybrid baseline on two adversarial workloads. With a 25 MB cache, Cold-RL raises hit ratio from 0.1436 to 0.3538, a 146 percent improvement over the best classical baseline; at 100 MB, from 0.7530 to 0.8675, a 15 percent gain; and at 400 MB it matches classical methods (about 0.918). Inference adds less than 2 percent CPU overhead and keeps 95th percentile eviction latency within budget. To our knowledge, this is the first reinforcement learning eviction policy integrated into NGINX with strict SLOs.
CRAug 12, 2025
Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMsAayush Gupta
Large language models (LLMs) remain acutely vulnerable to prompt injection and related jailbreak attacks; heuristic guardrails (rules, filters, LLM judges) are routinely bypassed. We present Contextual Integrity Verification (CIV), an inference-time security architecture that attaches cryptographically signed provenance labels to every token and enforces a source-trust lattice inside the transformer via a pre-softmax hard attention mask (with optional FFN/residual gating). CIV provides deterministic, per-token non-interference guarantees on frozen models: lower-trust tokens cannot influence higher-trust representations. On benchmarks derived from recent taxonomies of prompt-injection vectors (Elite-Attack + SoK-246), CIV attains 0% attack success rate under the stated threat model while preserving 93.1% token-level similarity and showing no degradation in model perplexity on benign tasks; we note a latency overhead attributable to a non-optimized data path. Because CIV is a lightweight patch -- no fine-tuning required -- we demonstrate drop-in protection for Llama-3-8B and Mistral-7B. We release a reference implementation, an automated certification harness, and the Elite-Attack corpus to support reproducible research.
IVJun 11, 2024
Progress Towards Decoding Visual Imagery via fNIRSMichel Adamic, Wellington Avelino, Anna Brandenberger et al.
We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 20% with 2-cm resolution. With simulations and high-density tomography, we found that time-domain fNIRS can achieve 1-cm resolution, compared to 2-cm resolution for continuous-wave fNIRS. Lastly, we share designs for a prototype time-domain fNIRS device, consisting of a laser driver, a single photon detector, and a time-to-digital converter system.
CLJun 6, 2024
The Prompt Report: A Systematic Survey of Prompt Engineering TechniquesSander Schulhoff, Michael Ilie, Nishant Balepur et al.
Generative Artificial Intelligence (GenAI) systems are increasingly being deployed across diverse industries and research domains. Developers and end-users interact with these systems through the use of prompting and prompt engineering. Although prompt engineering is a widely adopted and extensively researched area, it suffers from conflicting terminology and a fragmented ontological understanding of what constitutes an effective prompt due to its relatively recent emergence. We establish a structured understanding of prompt engineering by assembling a taxonomy of prompting techniques and analyzing their applications. We present a detailed vocabulary of 33 vocabulary terms, a taxonomy of 58 LLM prompting techniques, and 40 techniques for other modalities. Additionally, we provide best practices and guidelines for prompt engineering, including advice for prompting state-of-the-art (SOTA) LLMs such as ChatGPT. We further present a meta-analysis of the entire literature on natural language prefix-prompting. As a culmination of these efforts, this paper presents the most comprehensive survey on prompt engineering to date.
CVFeb 28, 2024
Human Shape and Clothing EstimationAayush Gupta, Aditya Gulati, Himanshu et al.
Human shape and clothing estimation has gained significant prominence in various domains, including online shopping, fashion retail, augmented reality (AR), virtual reality (VR), and gaming. The visual representation of human shape and clothing has become a focal point for computer vision researchers in recent years. This paper presents a comprehensive survey of the major works in the field, focusing on four key aspects: human shape estimation, fashion generation, landmark detection, and attribute recognition. For each of these tasks, the survey paper examines recent advancements, discusses their strengths and limitations, and qualitative differences in approaches and outcomes. By exploring the latest developments in human shape and clothing estimation, this survey aims to provide a comprehensive understanding of the field and inspire future research in this rapidly evolving domain.
LGFeb 28, 2022
Machine learning techniques to identify antibiotic resistance in patients diagnosed with various skin and soft tissue infectionsFarnaz H. Foomani, Shahzad Mirza, Sahjid Mukhida et al.
Skin and soft tissue infections (SSTIs) are among the most frequently observed diseases in ambulatory and hospital settings. Resistance of diverse bacterial pathogens to antibiotics is a significant cause of severe SSTIs, and treatment failure results in morbidity, mortality, and increased cost of hospitalization. Therefore, antimicrobial surveillance is essential to predict antibiotic resistance trends and monitor the results of medical interventions. To address this, we developed machine learning (ML) models (deep and conventional algorithms) to predict antimicrobial resistance using antibiotic susceptibility testing (ABST) data collected from patients clinically diagnosed with primary and secondary pyoderma over a period of one year. We trained an individual ML algorithm on each antimicrobial family to determine whether a Gram-Positive Cocci (GPC) or Gram-Negative Bacilli (GNB) bacteria will resist the corresponding antibiotic. For this purpose, clinical and demographic features from the patient and data from ABST were employed in training. We achieved an Area Under the Curve (AUC) of 0.68-0.98 in GPC and 0.56-0.93 in GNB bacteria, depending on the antimicrobial family. We also conducted a correlation analysis to determine the linear relationship between each feature and antimicrobial families in different bacteria. ML techniques suggest that a predictable nonlinear relationship exists between patients' clinical-demographic characteristics and antibiotic resistance; however, the accuracy of this prediction depends on the type of the antimicrobial family.
AINov 1, 2021
An AI-powered Smart Routing Solution for Payment SystemsRamya Bygari, Aayush Gupta, Shashwat Raghuvanshi et al.
In the current era of digitization, online payment systems are attracting considerable interest. Improving the efficiency of a payment system is important since it has a substantial impact on revenues for businesses. A gateway is an integral component of a payment system through which every transaction is routed. In an online payment system, payment processors integrate with these gateways by means of various configurations such as pricing, methods, risk checks, etc. These configurations are called terminals. Each gateway can have multiple terminals associated with it. Routing a payment transaction through the best terminal is crucial to increase the probability of a payment transaction being successful. Machine learning (ML) and artificial intelligence (AI) techniques can be used to accurately predict the best terminals based on their previous performance and various payment-related attributes. We have devised a pipeline consisting of static and dynamic modules. The static module does the initial filtering of the terminals using static rules and a logistic regression model that predicts gateway downtimes. Subsequently, the dynamic module computes a lot of novel features based on success rate, payment attributes, time lag, etc. to model the terminal behaviour accurately. These features are updated using an adaptive time decay rate algorithm in real-time using a feedback loop and passed to a random forest classifier to predict the success probabilities for every terminal. This pipeline is currently in production at Razorpay routing millions of transactions through it in real-time and has given a 4-6\% improvement in success rate across all payment methods (credit card, debit card, UPI, net banking). This has made our payment system more resilient to performance drops, which has improved the user experience, instilled more trust in the merchants, and boosted the revenue of the business.
AIJan 4, 2018
A Decision-theoretic Approach to Detection-based Target Search with a UAVAayush Gupta, Daniel Bessonov, Patrick Li
Search and rescue missions and surveillance require finding targets in a large area. These tasks often use unmanned aerial vehicles (UAVs) with cameras to detect and move towards a target. However, common UAV approaches make two simplifying assumptions. First, they assume that observations made from different heights are deterministically correct. In practice, observations are noisy, with the noise increasing as the height used for observations increases. Second, they assume that a motion command executes correctly, which may not happen due to wind and other environmental factors. To address these, we propose a sequential algorithm that determines actions in real time based on observations, using partially observable Markov decision processes (POMDPs). Our formulation handles both observations and motion uncertainty and errors. We run offline simulations and learn a policy. This policy is run on a UAV to find the target efficiently. We employ a novel compact formulation to represent the coordinates of the drone relative to the target coordinates. Our POMDP policy finds the target up to 3.4 times faster when compared to a heuristic policy.