Nirupam Roy

h-index5

7papers

13citations

Novelty44%

AI Score39

Ranked #104,086 of 205,806 authors (top 51%)#33,822 in CV (top 57%)

7 Papers

SYJan 7, 2017

Modeling Actuation Constraints for IoT Applications

Bharathan Balaji, Brad Campbell, Amit Levy et al.

Internet of Things (IoT) promises to bring ease of monitoring, better efficiency and innovative services across many domains with connected devices around us. With information from critical parts of infrastructure and powerful cloud-based data analytics, many applications can be developed to gain insights about IoT systems as well as transform their capabilities. Actuation applications form an essential part of these IoT systems, as they enable automation as well as fast low-level decision making. However, modern IoT systems are designed for data acquisition, and actuation applications are implemented in an ad-hoc manner. We identify modeling constraints in a systematic manner as indispensable to support actuation applications because constraints encompass high-level policies dictated by laws of physics, legal policies, user preferences. We explore data models for constraints inIoT system with the example of a home heating system and illustrate the challenges in enforcing these constraints in theIoT system architecture.

59.9CVMar 16

FEEL (Force-Enhanced Egocentric Learning): A Dataset for Physical Action Understanding

Eadom Dessalene, Botao He, Michael Maynord et al.

We introduce FEEL (Force-Enhanced Egocentric Learning), the first large-scale dataset pairing force measurements gathered from custom piezoresistive gloves with egocentric video. Our gloves enable scalable data collection, and FEEL contains approximately 3 million force-synchronized frames of natural unscripted manipulation in kitchen environments, with 45% of frames involving hand-object contact. Because force is the underlying cause that drives physical interaction, it is a critical primitive for physical action understanding. We demonstrate the utility of force for physical action understanding through application of FEEL to two families of tasks: (1) contact understanding, where we jointly perform temporal contact segmentation and pixel-level contacted object segmentation; and, (2) action representation learning, where force prediction serves as a self-supervised pretraining objective for video backbones. We achieve state-of-the-art temporal contact segmentation results and competitive pixel-level segmentation results without any need for manual contacted object segmentation annotations. Furthermore we demonstrate that action representation learning with FEEL improves transfer performance on action understanding tasks without any manual labels over EPIC-Kitchens, SomethingSomething-V2, EgoExo4D and Meccano.

SDApr 11, 2025

Spatial Audio Processing with Large Language Model on Wearable Devices

Ayushi Mishra, Yang Bai, Priyadarshan Narayanasamy et al.

Integrating spatial context into large language models (LLMs) has the potential to revolutionize human-computer interaction, particularly in wearable devices. In this work, we present a novel system architecture that incorporates spatial speech understanding into LLMs, enabling contextually aware and adaptive applications for wearable technologies. Our approach leverages microstructure-based spatial sensing to extract precise Direction of Arrival (DoA) information using a monaural microphone. To address the lack of existing dataset for microstructure-assisted speech recordings, we synthetically create a dataset called OmniTalk by using the LibriSpeech dataset. This spatial information is fused with linguistic embeddings from OpenAI's Whisper model, allowing each modality to learn complementary contextual representations. The fused embeddings are aligned with the input space of LLaMA-3.2 3B model and fine-tuned with lightweight adaptation technique LoRA to optimize for on-device processing. SING supports spatially-aware automatic speech recognition (ASR), achieving a mean error of $25.72^\circ$-a substantial improvement compared to the 88.52$^\circ$ median error in existing work-with a word error rate (WER) of 5.3. SING also supports soundscaping, for example, inference how many people were talking and their directions, with up to 5 people and a median DoA error of 16$^\circ$. Our system demonstrates superior performance in spatial speech understanding while addressing the challenges of power efficiency, privacy, and hardware constraints, paving the way for advanced applications in augmented reality, accessibility, and immersive experiences.

LGFeb 14, 2024

IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture

Varun Ramani, Hossein Khayami, Yang Bai et al.

This paper presents a novel approach for predicting human poses using IMU data, diverging from previous studies such as DIP-IMU, IMUPoser, and TransPose, which use up to 6 IMUs in conjunction with bidirectional RNNs. We introduce two main innovations: a data-driven strategy for optimal IMU placement and a transformer-based model architecture for time series analysis. Our findings indicate that our approach not only outperforms traditional 6 IMU-based biRNN models but also that the transformer architecture significantly enhances pose reconstruction from data obtained from 24 IMU locations, with equivalent performance to biRNNs when using only 6 IMUs. The enhanced accuracy provided by our optimally chosen locations, when coupled with the parallelizability and performance of transformers, provides significant improvements to the field of IMU-based pose estimation.

CVMar 30, 2025

SpINR: Neural Volumetric Reconstruction for FMCW Radars

Harshvardhan Takawale, Nirupam Roy

In this paper, we introduce SpINR, a novel framework for volumetric reconstruction using Frequency-Modulated Continuous-Wave (FMCW) radar data. Traditional radar imaging techniques, such as backprojection, often assume ideal signal models and require dense aperture sampling, leading to limitations in resolution and generalization. To address these challenges, SpINR integrates a fully differentiable forward model that operates natively in the frequency domain with implicit neural representations (INRs). This integration leverages the linear relationship between beat frequency and scatterer distance inherent in FMCW radar systems, facilitating more efficient and accurate learning of scene geometry. Additionally, by computing outputs for only the relevant frequency bins, our forward model achieves greater computational efficiency compared to time-domain approaches that process the entire signal before transformation. Through extensive experiments, we demonstrate that SpINR significantly outperforms classical backprojection methods and existing learning-based approaches, achieving higher resolution and more accurate reconstructions of complex scenes. This work represents the first application of neural volumetic reconstruction in the radar domain, offering a promising direction for future research in radar-based imaging and perception systems.

CVJun 9, 2025

SpINRv2: Implicit Neural Representation for Passband FMCW Radars

Harshvardhan Takawale, Nirupam Roy

We present SpINRv2, a neural framework for high-fidelity volumetric reconstruction using Frequency-Modulated Continuous-Wave (FMCW) radar. Extending our prior work (SpINR), this version introduces enhancements that allow accurate learning under high start frequencies-where phase aliasing and sub-bin ambiguity become prominent. Our core contribution is a fully differentiable frequency-domain forward model that captures the complex radar response using closed-form synthesis, paired with an implicit neural representation (INR) for continuous volumetric scene modeling. Unlike time-domain baselines, SpINRv2 directly supervises the complex frequency spectrum, preserving spectral fidelity while drastically reducing computational overhead. Additionally, we introduce sparsity and smoothness regularization to disambiguate sub-bin ambiguities that arise at fine range resolutions. Experimental results show that SpINRv2 significantly outperforms both classical and learning-based baselines, especially under high-frequency regimes, establishing a new benchmark for neural radar-based 3D imaging.

CYOct 16, 2024

Continuous Pupillography: A Case for Visual Health Ecosystem

Usama Younus, Nirupam Roy

This article aims to cover pupillography, and its potential use in a number of ophthalmological diagnostic applications in biomedical space. With the ever-increasing incorporation of technology within our daily lives and an ever-growing active research into smart devices and technologies, we try to make a case for a health ecosystem that revolves around continuous eye monitoring. We tend to summarize the design constraints & requirements for an IoT-based continuous pupil detection system, with an attempt at developing a pipeline for wearable pupillographic device, while comparing two compact mini-camera modules currently available in the market. We use a light algorithm that can be directly adopted to current micro-controllers, and share our results for different lighting conditions, and scenarios. Lastly, we present our findings, along with an analysis on the challenges faced and a way ahead towards successfully building this ecosystem.