CVOct 28, 2022
Long-HOT: A Modular Hierarchical Approach for Long-Horizon Object TransportSriram Narayanan, Dinesh Jayaraman, Manmohan Chandraker
We address key challenges in long-horizon embodied exploration and navigation by proposing a new object transport task and a novel modular framework for temporally extended navigation. Our first contribution is the design of a novel Long-HOT environment focused on deep exploration and long-horizon planning where the agent is required to efficiently find and pick up target objects to be carried and dropped at a goal location, with load constraints and optional access to a container if it finds one. Further, we propose a modular hierarchical transport policy (HTP) that builds a topological graph of the scene to perform exploration with the help of weighted frontiers. Our hierarchical approach uses a combination of motion planning algorithms to reach point goals within explored locations and object navigation policies for moving towards semantic targets at unknown locations. Experiments on both our proposed Habitat transport task and on MultiOn benchmarks show that our method significantly outperforms baselines and prior works. Further, we validate the effectiveness of our modular approach for long-horizon transport by demonstrating meaningful generalization to much harder transport scenes with training only on simpler versions of the task.
CVApr 30
PhyCo: Learning Controllable Physical Priors for Generative MotionSriram Narayanan, Ziyu Jiang, Srinivasa Narasimhan et al.
Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their underlying properties. We present PhyCo, a framework that introduces continuous, interpretable, and physically grounded control into video generation. Our approach integrates three key components: (i) a large-scale dataset of over 100K photorealistic simulation videos where friction, restitution, deformation, and force are systematically varied across diverse scenarios; (ii) physics-supervised fine-tuning of a pretrained diffusion model using a ControlNet conditioned on pixel-aligned physical property maps; and (iii) VLM-guided reward optimization, where a fine-tuned vision-language model evaluates generated videos with targeted physics queries and provides differentiable feedback. This combination enables a generative model to produce physically consistent and controllable outputs through variations in physical attributes-without any simulator or geometry reconstruction at inference. On the Physics-IQ benchmark, PhyCo significantly improves physical realism over strong baselines, and human studies confirm clearer and more faithful control over physical attributes. Our results demonstrate a scalable path toward physically consistent, controllable generative video models that generalize beyond synthetic training environments.
SYApr 27
Reduced-Order Data Assimilation for Thermospheric Density Using Physics-informed SINDyc ModelsSriram Narayanan, Daniele Sicoli, Piyush Mehta
Accurate estimation of thermospheric mass density is a prerequisite for orbit prediction and space situational awareness, where the upper atmosphere responds nonlinearly to solar and geomagnetic forcing across several orders of magnitude. Physics-based general circulation models resolve this response but are computationally expensive, while empirical models run cheaply but lack a time-evolving atmospheric state. This work couples a data-driven reduced-order thermospheric model with a Kalman filter that assimilates in situ density observations. An autoregressive Sparse Identification of Nonlinear Dynamics with control (SINDy$_c$-AR) reduced-order model derived from the Thermosphere-Ionosphere-Electrodynamics General Circulation Model (TIE-GCM) captures the dominant modes of variability and their dependence on solar and geomagnetic drivers at a fraction of the parent model's cost. Density observations from CHAMP, GRACE, GRACE-FO, GOCE, and Swarm are assimilated across a range of orbital configurations and geomagnetic conditions, with a linear DMDc model evaluated as a reference. Assimilation reduces density estimation error relative to open-loop predictions, most visibly during geomagnetic storms and under single-satellite coverage. SINDy$_c$-AR and DMDc perform comparably on assimilated orbits; on withheld orbits, SINDy$_c$-AR is more accurate in the in-training scenarios while DMDc is better in the out-of-training 2024 Swarm-C case. Benchmarks against NRLMSIS~2.1 and HASDM (2000--2019, where available) show that empirical references can outperform the assimilated model far from the assimilated track, so results are framed as improvements over the open-loop forecast.
SYApr 21
State Forecasting in an Estimation Framework with Surrogate Sensor ModelingSriram Narayanan, Mohamed Naveed Gul Mohamed, Ishan Paranjape et al.
In recent years, computational power and data availability breakthroughs have revolutionized our ability to analyze complex physical systems through the inverse problem approach. Data-driven techniques like system identification and machine learning play an important role in this field, allowing us to gain insights into previously inaccessible phenomena. However, a major hurdle remains: How can meaningful information from partial measurements be extracted? In the aerospace domain, the challenge of state estimation is particularly pronounced due to the limited availability of observational data and the constraints imposed by sensor capabilities for tracking resident space objects (RSOs). To address these limitations, advanced compensation methodologies are required. Currently, range and bearing measurements obtained from radar and optical systems constitute the primary observational tools in the space situational awareness (SSA) community. In this work, we propose a novel framework that integrates a simplified reference dynamics model with a data-driven surrogate measurement model. This fusion process leverages the strengths of both models to estimate complex dynamical behaviors under conditions of partial observability. Extensive numerical experiments were conducted across multiple datasets to validate the proposed framework. The results demonstrate its efficacy in accurately reconstructing system dynamics from incomplete measurement data. Furthermore, to ensure the robustness of the framework, an initial consistency analysis of the surrogate modeling approach is presented. By addressing the current challenges and refining the integration of data-driven techniques with traditional physics-based modeling, this framework aims to advance state estimation methodologies in the aerospace sector.
CVSep 14, 2025
Dual Band Video Thermography Near Ambient ConditionsSriram Narayanan, Mani Ramanagopal, Srinivasa G. Narasimhan
Long-wave infrared radiation captured by a thermal camera consists of two components: (a) light from the environment reflected or transmitted by a surface, and (b) light emitted by the surface after undergoing heat transport through the object and exchanging heat with the surrounding environment. Separating these components is essential for understanding object properties such as emissivity, temperature, reflectance and shape. Previous thermography studies often assume that only one component is dominant (e.g., in welding) or that the second component is constant and can be subtracted. However, in near-ambient conditions, which are most relevant to computer vision applications, both components are typically comparable in magnitude and vary over time. We introduce the first method that separates reflected and emitted components of light in videos captured by two thermal cameras with different spectral sensitivities. We derive a dual-band thermal image formation model and develop algorithms to estimate the surface's emissivity and its time-varying temperature while isolating a dynamic background. We quantitatively evaluate our approach using carefully calibrated emissivities for a range of materials and show qualitative results on complex everyday scenes, such as a glass filled with hot liquid and people moving in the background.
CVFeb 10, 2025
Indoor Heat Estimation from a Single Visible-Light PanoramaGuanzhou Ji, Sriram Narayanan, Azadeh Sawyer et al.
This paper introduces a novel image-based rendering technique for jointly estimating indoor lighting and thermal conditions from paired indoor-outdoor high dynamic range (HDR) panoramas. Our method uses the indoor panorama to estimate the 3D floor layout, while the corresponding outdoor panorama serves as an environment map to infer spatially-varying illumination and material properties. Assuming indoor surfaces are Lambertian and that all heat originates from outdoor visible light, we model the relationship between light transport and heat transfer, and perform transient heat simulations to generate indoor temperature distributions. The simulated heat maps are validated against real-world thermal images captured with an infrared camera. This approach supports photorealistic and physically informed visualization, enabling integrated light and heat estimation to advance traditional virtual home staging.
CVApr 16, 2021
Divide-and-Conquer for Lane-Aware Diverse Trajectory PredictionSriram Narayanan, Ramin Moslemi, Francesco Pittaluga et al.
Trajectory prediction is a safety-critical tool for autonomous vehicles to plan and execute actions. Our work addresses two key challenges in trajectory prediction, learning multimodal outputs, and better predictions by imposing constraints using driving knowledge. Recent methods have achieved strong performances using Multi-Choice Learning objectives like winner-takes-all (WTA) or best-of-many. But the impact of those methods in learning diverse hypotheses is under-studied as such objectives highly depend on their initialization for diversity. As our first contribution, we propose a novel Divide-And-Conquer (DAC) approach that acts as a better initialization technique to WTA objective, resulting in diverse outputs without any spurious modes. Our second contribution is a novel trajectory prediction framework called ALAN that uses existing lane centerlines as anchors to provide trajectories constrained to the input lanes. Our framework provides multi-agent trajectory outputs in a forward pass by capturing interactions through hypercolumn descriptors and incorporating scene information in the form of rasterized images and per-agent lane anchors. Experiments on synthetic and real data show that the proposed DAC captures the data distribution better compare to other WTA family of objectives. Further, we show that our ALAN approach provides on par or better performance with SOTA methods evaluated on Nuscenes urban driving benchmark.