Ayush Pandey

CV
h-index31
7papers
73citations
Novelty33%
AI Score41

7 Papers

CVApr 24, 2023
Towards Realistic Generative 3D Face Models

Aashish Rai, Hiresh Gupta, Ayush Pandey et al.

In recent years, there has been significant progress in 2D generative face models fueled by applications such as animation, synthetic data generation, and digital avatars. However, due to the absence of 3D information, these 2D models often struggle to accurately disentangle facial attributes like pose, expression, and illumination, limiting their editing capabilities. To address this limitation, this paper proposes a 3D controllable generative face model to produce high-quality albedo and precise 3D shape leveraging existing 2D generative models. By combining 2D face generative models with semantic face manipulation, this method enables editing of detailed 3D rendered faces. The proposed framework utilizes an alternating descent optimization approach over shape and albedo. Differentiable rendering is used to train high-quality shapes and albedo without 3D supervision. Moreover, this approach outperforms the state-of-the-art (SOTA) methods in the well-known NoW benchmark for shape reconstruction. It also outperforms the SOTA reconstruction models in recovering rendered faces' identities across novel poses by an average of 10%. Additionally, the paper demonstrates direct control of expressions in 3D faces by exploiting latent space leading to text-based editing of 3D faces.

CVNov 14, 2025
Refine and Align: Confidence Calibration through Multi-Agent Interaction in VQA

Ayush Pandey, Jai Bardhan, Ishita Jain et al.

In the context of Visual Question Answering (VQA) and Agentic AI, calibration refers to how closely an AI system's confidence in its answers reflects their actual correctness. This aspect becomes especially important when such systems operate autonomously and must make decisions under visual uncertainty. While modern VQA systems, powered by advanced vision-language models (VLMs), are increasingly used in high-stakes domains like medical diagnostics and autonomous navigation due to their improved accuracy, the reliability of their confidence estimates remains under-examined. Particularly, these systems often produce overconfident responses. To address this, we introduce AlignVQA, a debate-based multi-agent framework, in which diverse specialized VLM -- each following distinct prompting strategies -- generate candidate answers and then engage in two-stage interaction: generalist agents critique, refine and aggregate these proposals. This debate process yields confidence estimates that more accurately reflect the model's true predictive performance. We find that more calibrated specialized agents produce better aligned confidences. Furthermore, we introduce a novel differentiable calibration-aware loss function called aligncal designed to fine-tune the specialized agents by minimizing an upper bound on the calibration error. This objective explicitly improves the fidelity of each agent's confidence estimates. Empirical results across multiple benchmark VQA datasets substantiate the efficacy of our approach, demonstrating substantial reductions in calibration discrepancies. Furthermore, we propose a novel differentiable calibration-aware loss to fine-tune the specialized agents and improve the quality of their individual confidence estimates based on minimising upper bound calibration error.

SYJun 21, 2019
Information Performance Tradeoffs in Control

Ayush Pandey

We focus our attention on the most common scenario in networked control systems where the measured output from the observer is transmitted via a communication channel to the controller. Using information theoretic results, we studied the tradeoff between the performance and the accuracy of observations due to communication constraints for such a scenario. We focused on three important cases in the communication channel, the additive white Gaussian noise (AWGN), limited data rate and systems with multiplicative uncertainty in the system parameters. Using known theoretical results for a rate limited communication channel, we showed the effect of entropy of the output of quantizer on the control performance. The same was done for the case of multiplicative uncertainty in the system . For an AWGN channel, we showed the effect of channel SNR on the performance. For the analog joint source channel coding approach (which works only for Gaussian disturbances in the system), we showed that the known lower bound is tight even for non Gaussian system disturbances. We also compared the simulated performance of a system with known upper and lower rate distortion bounds for all the three cases. The lower bound on the rate is closely approached by a simple uniform quantization scheme, hence demonstrating its tightness.

3.8DBApr 7
CobbleDB: Modelling Levelled Storage by Composition

Emilie Ma, Ayush Pandey, Annette Bieniusa et al.

We present a composition-based approach to building correctby-construction database backing stores. In previous work, we specified the behaviour of several store variants and proved their correctness and equivalence. Here, we derive a Java implementation: the simplicity of the specification makes manual construction straightforward. We leverage spec-guaranteed store equivalence to compose performance features, then demonstrate practical value with CobbleDB, a reimplementation of RocksDB's levelled storage.

AISep 29, 2025
Radiology's Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology

Suvrankar Datta, Divya Buchireddygari, Lakshmi Vennela Chowdary Kaza et al.

Generalist multimodal AI systems such as large language models (LLMs) and vision language models (VLMs) are increasingly accessed by clinicians and patients alike for medical image interpretation through widely available consumer-facing chatbots. Most evaluations claiming expert level performance are on public datasets containing common pathologies. Rigorous evaluation of frontier models on difficult diagnostic cases remains limited. We developed a pilot benchmark of 50 expert-level "spot diagnosis" cases across multiple imaging modalities to evaluate the performance of frontier AI models against board-certified radiologists and radiology trainees. To mirror real-world usage, the reasoning modes of five popular frontier AI models were tested through their native web interfaces, viz. OpenAI o3, OpenAI GPT-5, Gemini 2.5 Pro, Grok-4, and Claude Opus 4.1. Accuracy was scored by blinded experts, and reproducibility was assessed across three independent runs. GPT-5 was additionally evaluated across various reasoning modes. Reasoning quality errors were assessed and a taxonomy of visual reasoning errors was defined. Board-certified radiologists achieved the highest diagnostic accuracy (83%), outperforming trainees (45%) and all AI models (best performance shown by GPT-5: 30%). Reliability was substantial for GPT-5 and o3, moderate for Gemini 2.5 Pro and Grok-4, and poor for Claude Opus 4.1. These findings demonstrate that advanced frontier models fall far short of radiologists in challenging diagnostic cases. Our benchmark highlights the present limitations of generalist AI in medical imaging and cautions against unsupervised clinical use. We also provide a qualitative analysis of reasoning traces and propose a practical taxonomy of visual reasoning errors by AI models for better understanding their failure modes, informing evaluation standards and guiding more robust model development.

SYDec 5, 2016
Modeling and Control of an Autonomous Three Wheeled Mobile Robot with Front Steer

Ayush Pandey, Siddharth Jha, Debashish Chakravarty

Modeling and control strategies for a design of an autonomous three wheeled mobile robot with front wheel steer is presented. Although, the three-wheel vehicle design with front wheel steer is common in automotive vehicles used often in public transport, but its advantages in navigation and localization of autonomous vehicles is seldom utilized. We present the system model for such a robotic vehicle. A PID controller for speed control is designed for the model obtained and has been implemented in a digital control framework. The trajectory control framework, which is a challenging task for such a three-wheeled robot has also been presented in the paper. The derived system model has been verified using experimental results obtained for the robot vehicle design. Controller performance and robustness issues have also been discussed briefly.

RONov 1, 2016
Low Cost Autonomous Navigation and Control of a Mechanically Balanced Bicycle with Dual Locomotion Mode

Ayush Pandey, Subhamoy Mahajan, Adarsh Kosta et al.

On the lines of the huge and varied efforts in the field of automation with respect to technology development and innovation of vehicles to make them run autonomously, this paper presents an innovation to a bicycle. A normal daily use bicycle was modified at low cost such that it runs autonomously, while maintaining its original form i.e. the manual drive. Hence, a bicycle which could be normally driven by any human and with a press of switch could run autonomously according to the needs of the user has been developed.