Parv Maheshwari

RO
h-index17
8papers
18citations
Novelty33%
AI Score41

8 Papers

CVFeb 5Code
AnyThermal: Towards Learning Universal Representations for Thermal Perception

Parv Maheshwari, Jay Karhade, Yogesh Chawla et al.

We present AnyThermal, a thermal backbone that captures robust task-agnostic thermal features suitable for a variety of tasks such as cross-modal place recognition, thermal segmentation, and monocular depth estimation using thermal images. Existing thermal backbones that follow task-specific training from small-scale data result in utility limited to a specific environment and task. Unlike prior methods, AnyThermal can be used for a wide range of environments (indoor, aerial, off-road, urban) and tasks, all without task-specific training. Our key insight is to distill the feature representations from visual foundation models such as DINOv2 into a thermal encoder using thermal data from these multiple environments. To bridge the diversity gap of the existing RGB-Thermal datasets, we introduce the TartanRGBT platform, the first open-source data collection platform with synced RGB-Thermal image acquisition. We use this payload to collect the TartanRGBT dataset - a diverse and balanced dataset collected in 4 environments. We demonstrate the efficacy of AnyThermal and TartanRGBT, achieving state-of-the-art results with improvements of up to 36% across diverse environments and downstream tasks on existing datasets.

CVAug 18, 2022Code
Reproducibility Report: Contrastive Learning of Socially-aware Motion Representations

Roopsa Sen, Sidharth Sinha, Parv Maheshwari et al.

The following paper is a reproducibility report for "Social NCE: Contrastive Learning of Socially-aware Motion Representations" {\cite{liu2020snce}} published in ICCV 2021 as part of the ML Reproducibility Challenge 2021. The original code was made available by the author \footnote{\href{https://github.com/vita-epfl/social-nce}{https://github.com/vita-epfl/social-nce}}. We attempted to verify the results claimed by the authors and reimplemented their code in PyTorch Lightning.

AIJul 16, 2022
[Reproducibility Report] Path Planning using Neural A* Search

Shreya Bhatt, Aayush Jain, Parv Maheshwari et al.

The following paper is a reproducibility report for "Path Planning using Neural A* Search" published in ICML2 2021 as part of the ML Reproducibility Challenge 2021. The original paper proposes the Neural A* planner, and claims it achieves an optimal balance between the reduction of node expansions and path accuracy. We verify this claim by reimplementing the model in a different framework and reproduce the data published in the original paper. We have also provided a code-flow diagram to aid comprehension of the code structure. As extensions to the original paper, we explore the effects of (1) generalizing the model by training it on a shuffled dataset, (2) introducing dropout, (3) implementing empirically chosen hyperparameters as trainable parameters in the model, (4) altering the network model to Generative Adversarial Networks (GANs) to introduce stochasticity, (5) modifying the encoder from Unet to Unet++, (6) incorporating cost maps obtained from the Neural A* module in other variations of A* search.

ROAug 19, 2022
[Re] Differentiable Spatial Planning using Transformers

Rohit Ranjan, Himadri Bhakta, Animesh Jha et al.

This report covers our reproduction effort of the paper 'Differentiable Spatial Planning using Transformers' by Chaplot et al. . In this paper, the problem of spatial path planning in a differentiable way is considered. They show that their proposed method of using Spatial Planning Transformers outperforms prior data-driven models and leverages differentiable structures to learn mapping without a ground truth map simultaneously. We verify these claims by reproducing their experiments and testing their method on new data. We also investigate the stability of planning accuracy with maps with increased obstacle complexity. Efforts to investigate and verify the learnings of the Mapper module were met with failure stemming from a paucity of computational resources and unreachable authors.

ROMar 18, 2024
Deep Bayesian Future Fusion for Self-Supervised, High-Resolution, Off-Road Mapping

Shubhra Aich, Wenshan Wang, Parv Maheshwari et al.

High-speed off-road navigation requires long-range, high-resolution maps to enable robots to safely navigate over different surfaces while avoiding dangerous obstacles. However, due to limited computational power and sensing noise, most approaches to off-road mapping focus on producing coarse (20-40cm) maps of the environment. In this paper, we propose Future Fusion, a framework capable of generating dense, high-resolution maps from sparse sensing data (30m forward at 2cm). This is accomplished by - (1) the efficient realization of the well-known Bayes filtering within the standard deep learning models that explicitly accounts for the sparsity pattern in stereo and LiDAR depth data, and (2) leveraging perceptual losses common in generative image completion. The proposed methodology outperforms the conventional baselines. Moreover, the learned features and the completed dense maps lead to improvements in the downstream navigation task.

AINov 24, 2025
Cross Domain Evaluation of Multimodal Chain-of-Thought Reasoning of different datasets into the Amazon CoT Framework

Nitya Tiwari, Parv Maheshwari, Vidisha Agarwal

While recent work has extended CoT to multimodal settings, achieving state-of-the-art results on science question answering benchmarks like ScienceQA, the generalizability of these approaches across diverse domains remains underexplored. This work presents a comprehensive analysis of Multimodal Chain-of-Thought (Multimodal-CoT) reasoning, evaluating its effectiveness on the A-OKVQA, OKVQA and ChartQA datasets, which requires broad commonsense and world knowledge beyond scientific reasoning. We implement the two-stage framework proposed by Zhang et al. [3], which separates rationale generation from answer inference and integrates vision features through a gated fusion mechanism with T5-based language models. Through systematic ablation studies, we analyze the contributions of vision features, rationale quality, and architectural choices. Our findings reveal that while vision integration significantly reduces hallucination in rationale generation, the effectiveness of CoT reasoning varies substantially across question types, with commonsense reasoning presenting particular challenges. This work provides practical insights for researchers implementing multimodal reasoning systems and identifies key areas for future improvement in cross-domain generalization.

ROJun 26, 2025
ThermalDiffusion: Visual-to-Thermal Image-to-Image Translation for Autonomous Navigation

Shruti Bansal, Wenshan Wang, Yifei Liu et al.

Autonomous systems rely on sensors to estimate the environment around them. However, cameras, LiDARs, and RADARs have their own limitations. In nighttime or degraded environments such as fog, mist, or dust, thermal cameras can provide valuable information regarding the presence of objects of interest due to their heat signature. They make it easy to identify humans and vehicles that are usually at higher temperatures compared to their surroundings. In this paper, we focus on the adaptation of thermal cameras for robotics and automation, where the biggest hurdle is the lack of data. Several multi-modal datasets are available for driving robotics research in tasks such as scene segmentation, object detection, and depth estimation, which are the cornerstone of autonomous systems. However, they are found to be lacking in thermal imagery. Our paper proposes a solution to augment these datasets with synthetic thermal data to enable widespread and rapid adaptation of thermal cameras. We explore the use of conditional diffusion models to convert existing RGB images to thermal images using self-attention to learn the thermal properties of real-world objects.

ROSep 15, 2021
Local NMPC on Global Optimised Path for Autonomous Racing

Dvij Kalaria, Parv Maheshwari, Animesh Jha et al.

The paper presents a strategy for the control of anautonomous racing car on a pre-mapped track. Using a dynamic model of the vehicle, the optimal racing line is computed, taking track boundaries into account. With the optimal racing line as areference, a local nonlinear model predictive controller (NMPC) is proposed, which takes into account multiple local objectives like making more progress along the race line, avoiding collision with opponent vehicles, and use of drafting to achieve more progress.