Jai Bardhan

CV
h-index67
9papers
34citations
Novelty44%
AI Score48

9 Papers

RODec 22, 2025
REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation

Martin Sedlacek, Pavlo Yefanov, Georgy Ponimatkin et al.

Vision-Language-Action (VLA) models empower robots to understand and execute tasks described by natural language instructions. However, a key challenge lies in their ability to generalize beyond the specific environments and conditions they were trained on, which is presently difficult and expensive to evaluate in the real-world. To address this gap, we present REALM, a new simulation environment and benchmark designed to evaluate the generalization capabilities of VLA models, with a specific emphasis on establishing a strong correlation between simulated and real-world performance through high-fidelity visuals and aligned robot control. Our environment offers a suite of 15 perturbation factors, 7 manipulation skills, and more than 3,500 objects. Finally, we establish two task sets that form our benchmark and evaluate the π_{0}, π_{0}-FAST, and GR00T N1.5 VLA models, showing that generalization and robustness remain an open challenge. More broadly, we also show that simulation gives us a valuable proxy for the real-world and allows us to systematically probe for and quantify the weaknesses and failure modes of VLAs. Project page: https://martin-sedlacek.com/realm

18.1GRMay 25
Curve Skeletonization in Continuous domain for Meshes and Point Clouds

Jai Bardhan, Ramya Hebbalaguppe, Aravind Udupa

Advancements in 3D curve skeletonization are accelerating progress across a wide range of applications. However, developing robust skeletonization algorithms that capture intricate object details remains challenging. Skeletonization via Local Separators (LS) offers an efficient graph-based approach but suffers from representation inaccuracies due to its discrete nature. To address this, we introduce CSCD, a novel framework for Curve Skeletonization in the Continuous Domain, generalizing LS to manifolds. Specifically, we present two realizations: CSCD-M for meshes and CSCD-PC for point clouds. CSCD-M leverages the intrinsic triangulation of a mesh for resilience to noise and improved topological preservation, while CSCD-PC employs tufted Laplacians for enhanced robustness. To our knowledge, CSCD-M is the first intrinsic method for curve skeletonization. Our results show CSCD-M matches LS performance across diverse meshes and outperforms LS (TOG'21) on benchmarks like Thingi10k dataset. CSCD-PC qualitatively outperforms CoverageAxis++ (Eurographics'24) and EPCS (CAG'23). Finally, we demonstrate the efficacy of CSCD in a few downstream tasks: object classification, shape segmentation, identifying handles, tunnels, and constrictions in objects. Project Website: https://cscd-skel.pages.dev

CVNov 14, 2025
Refine and Align: Confidence Calibration through Multi-Agent Interaction in VQA

Ayush Pandey, Jai Bardhan, Ishita Jain et al.

In the context of Visual Question Answering (VQA) and Agentic AI, calibration refers to how closely an AI system's confidence in its answers reflects their actual correctness. This aspect becomes especially important when such systems operate autonomously and must make decisions under visual uncertainty. While modern VQA systems, powered by advanced vision-language models (VLMs), are increasingly used in high-stakes domains like medical diagnostics and autonomous navigation due to their improved accuracy, the reliability of their confidence estimates remains under-examined. Particularly, these systems often produce overconfident responses. To address this, we introduce AlignVQA, a debate-based multi-agent framework, in which diverse specialized VLM -- each following distinct prompting strategies -- generate candidate answers and then engage in two-stage interaction: generalist agents critique, refine and aggregate these proposals. This debate process yields confidence estimates that more accurately reflect the model's true predictive performance. We find that more calibrated specialized agents produce better aligned confidences. Furthermore, we introduce a novel differentiable calibration-aware loss function called aligncal designed to fine-tune the specialized agents by minimizing an upper bound on the calibration error. This objective explicitly improves the fidelity of each agent's confidence estimates. Empirical results across multiple benchmark VQA datasets substantiate the efficacy of our approach, demonstrating substantial reductions in calibration discrepancies. Furthermore, we propose a novel differentiable calibration-aware loss to fine-tune the specialized agents and improve the quality of their individual confidence estimates based on minimising upper bound calibration error.

CVSep 1, 2024
ReMOVE: A Reference-free Metric for Object Erasure

Aditya Chandrasekar, Goirik Chakrabarty, Jai Bardhan et al.

We introduce $\texttt{ReMOVE}$, a novel reference-free metric for assessing object erasure efficacy in diffusion-based image editing models post-generation. Unlike existing measures such as LPIPS and CLIPScore, $\texttt{ReMOVE}$ addresses the challenge of evaluating inpainting without a reference image, common in practical scenarios. It effectively distinguishes between object removal and replacement. This is a key issue in diffusion models due to stochastic nature of image generation. Traditional metrics fail to align with the intuitive definition of inpainting, which aims for (1) seamless object removal within masked regions (2) while preserving the background continuity. $\texttt{ReMOVE}$ not only correlates with state-of-the-art metrics and aligns with human perception but also captures the nuanced aspects of the inpainting process, providing a finer-grained evaluation of the generated outputs.

78.7ROMar 26
Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning

Jai Bardhan, Patrik Drozdik, Josef Sivic et al.

Action-conditioned robot world models generate future video frames of the manipulated scene given a robot action sequence, offering a promising alternative for simulating tasks that are difficult to model with traditional physics engines. However, these models are optimized for short-term prediction and break down when deployed autoregressively: each predicted clip feeds back as context for the next, causing errors to compound and visual quality to rapidly degrade. We address this through the following contributions. First, we introduce a reinforcement learning (RL) post-training scheme that trains the world model on its own autoregressive rollouts rather than on ground-truth histories. We achieve this by adapting a recent contrastive RL objective for diffusion models to our setting and show that its convergence guarantees carry over exactly. Second, we design a training protocol that generates and compares multiple candidate variable-length futures from the same rollout state, reinforcing higher-fidelity predictions over lower-fidelity ones. Third, we develop efficient, multi-view visual fidelity rewards that combine complementary perceptual metrics across camera views and are aggregated at the clip level for dense, low-variance training signal. Fourth, we show that our approach establishes a new state-of-the-art for rollout fidelity on the DROID dataset, outperforming the strongest baseline on all metrics (e.g., LPIPS reduced by 14% on external cameras, SSIM improved by 9.1% on the wrist camera), winning 98% of paired comparisons, and achieving an 80% preference rate in a blind human study.

LGFeb 6, 2025
HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture

Jai Bardhan, Radhikesh Agrawal, Abhiram Tilak et al.

We present a transformer architecture-based foundation model for tasks at high-energy particle colliders such as the Large Hadron Collider. We train the model to classify jets using a self-supervised strategy inspired by the Joint Embedding Predictive Architecture. We use the JetClass dataset containing 100M jets of various known particles to pre-train the model with a data-centric approach -- the model uses a fraction of the jet constituents as the context to predict the embeddings of the unseen target constituents. Our pre-trained model fares well with other datasets for standard classification benchmark tasks. We test our model on two additional downstream tasks: top tagging and differentiating light-quark jets from gluon jets. We also evaluate our model with task-specific metrics and baselines and compare it with state-of-the-art models in high-energy physics. Project site: https://hep-jepa.github.io/

LGDec 18, 2024
Constructing sensible baselines for Integrated Gradients

Jai Bardhan, Cyrin Neeraj, Mihir Rawat et al.

Machine learning methods have seen a meteoric rise in their applications in the scientific community. However, little effort has been put into understanding these "black box" models. We show how one can apply integrated gradients (IGs) to understand these models by designing different baselines, by taking an example case study in particle physics. We find that the zero-vector baseline does not provide good feature attributions and that an averaged baseline sampled from the background events provides consistently more reasonable attributions.

HEP-PHMay 12, 2025
Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network

Jai Bardhan, Tanumoy Mandal, Subhadip Mitra et al.

Following up on our earlier study in [J. Bardhan et al., Machine learning-enhanced search for a vectorlike singlet B quark decaying to a singlet scalar or pseudoscalar, Phys. Rev. D 107 (2023) 115001; arXiv:2212.02442], we investigate the LHC prospects of pair-produced vectorlike $B$ quarks decaying exotically to a new gauge-singlet (pseudo)scalar field $Φ$ and a $b$ quark. After the electroweak symmetry breaking, the $Φ$ decays predominantly to $gg/bb$ final states, leading to a fully hadronic $2b+4j$ or $6b$ signature. Because of the large Standard Model background and the lack of leptonic handles, it is a difficult channel to probe. To overcome the challenge, we employ a hybrid deep learning model containing a graph neural network followed by a deep neural network. We estimate that such a state-of-the-art deep learning analysis pipeline can lead to a performance comparable to that in the semi-leptonic mode, taking the discovery (exclusion) reach up to about $M_B=1.8\:(2.4)$ TeV at HL-LHC when $B$ decays fully exotically, i.e., BR$(B \to bΦ) = 100\%$.

HEP-PHDec 12, 2024
Loss function to optimise signal significance in particle physics

Jai Bardhan, Cyrin Neeraj, Subhadip Mitra et al.

We construct a surrogate loss to directly optimise the significance metric used in particle physics. We evaluate our loss function for a simple event classification task using a linear model and show that it produces decision boundaries that change according to the cross sections of the processes involved. We find that the models trained with the new loss have higher signal efficiency for similar values of estimated signal significance compared to ones trained with a cross-entropy loss, showing promise to improve sensitivity of particle physics searches at colliders.