Savinay Nagendra

CV
h-index5
7papers
86citations
Novelty43%
AI Score28

7 Papers

CVNov 12, 2022
PatchRefineNet: Improving Binary Segmentation by Incorporating Signals from Optimal Patch-wise Binarization

Savinay Nagendra, Chaopeng Shen, Daniel Kifer

The purpose of binary segmentation models is to determine which pixels belong to an object of interest (e.g., which pixels in an image are part of roads). The models assign a logit score (i.e., probability) to each pixel and these are converted into predictions by thresholding (i.e., each pixel with logit score $\geq τ$ is predicted to be part of a road). However, a common phenomenon in current and former state-of-the-art segmentation models is spatial bias -- in some patches, the logit scores are consistently biased upwards and in others they are consistently biased downwards. These biases cause false positives and false negatives in the final predictions. In this paper, we propose PatchRefineNet (PRN), a small network that sits on top of a base segmentation model and learns to correct its patch-specific biases. Across a wide variety of base models, PRN consistently helps them improve mIoU by 2-3\%. One of the key ideas behind PRN is the addition of a novel supervision signal during training. Given the logit scores produced by the base segmentation model, each pixel is given a pseudo-label that is obtained by optimally thresholding the logit scores in each image patch. Incorporating these pseudo-labels into the loss function of PRN helps correct systematic biases and reduce false positives/negatives. Although we mainly focus on binary segmentation, we also show how PRN can be extended to saliency detection and few-shot segmentation. We also discuss how the ideas can be extended to multiclass segmentation.

CVNov 18, 2023
Estimating Uncertainty in Landslide Segmentation Models

Savinay Nagendra, Chaopeng Shen, Daniel Kifer

Landslides are a recurring, widespread hazard. Preparation and mitigation efforts can be aided by a high-quality, large-scale dataset that covers global at-risk areas. Such a dataset currently does not exist and is impossible to construct manually. Recent automated efforts focus on deep learning models for landslide segmentation (pixel labeling) from satellite imagery. However, it is also important to characterize the uncertainty or confidence levels of such segmentations. Accurate and robust uncertainty estimates can enable low-cost (in terms of manual labor) oversight of auto-generated landslide databases to resolve errors, identify hard negative examples, and increase the size of labeled training data. In this paper, we evaluate several methods for assessing pixel-level uncertainty of the segmentation. Three methods that do not require architectural changes were compared, including Pre-Threshold activations, Monte-Carlo Dropout and Test-Time Augmentation -- a method that measures the robustness of predictions in the face of data augmentation. Experimentally, the quality of the latter method was consistently higher than the others across a variety of models and metrics in our dataset.

CVDec 16, 2024
SAMIC: Segment Anything with In-Context Spatial Prompt Engineering

Savinay Nagendra, Kashif Rashid, Chaopeng Shen et al.

Few-shot segmentation is the problem of learning to identify specific types of objects (e.g., airplanes) in images from a small set of labeled reference images. The current state of the art is driven by resource-intensive construction of models for every new domain-specific application. Such models must be trained on enormous labeled datasets of unrelated objects (e.g., cars, trains, animals) so that their ``knowledge'' can be transferred to new types of objects. In this paper, we show how to leverage existing vision foundation models (VFMs) to reduce the incremental cost of creating few-shot segmentation models for new domains. Specifically, we introduce SAMIC, a small network that learns how to prompt VFMs in order to segment new types of objects in domain-specific applications. SAMIC enables any task to be approached as a few-shot learning problem. At 2.6 million parameters, it is 94% smaller than the leading models (e.g., having ResNet 101 backbone with 45+ million parameters). Even using 1/5th of the training data provided by one-shot benchmarks, SAMIC is competitive with, or sets the state of the art, on a variety of few-shot and semantic segmentation datasets including COCO-$20^i$, Pascal-$5^i$, PerSeg, FSS-1000, and NWPU VHR-10.

CVMar 24, 2024
Emotion Recognition from the perspective of Activity Recognition

Savinay Nagendra, Prapti Panigrahi

Applications of an efficient emotion recognition system can be found in several domains such as medicine, driver fatigue surveillance, social robotics, and human-computer interaction. Appraising human emotional states, behaviors, and reactions displayed in real-world settings can be accomplished using latent continuous dimensions. Continuous dimensional models of human affect, such as those based on valence and arousal are more accurate in describing a broad range of spontaneous everyday emotions than more traditional models of discrete stereotypical emotion categories (e.g. happiness, surprise). Most of the prior work on estimating valence and arousal considers laboratory settings and acted data. But, for emotion recognition systems to be deployed and integrated into real-world mobile and computing devices, we need to consider data collected in the world. Action recognition is a domain of Computer Vision that involves capturing complementary information on appearance from still frames and motion between frames. In this paper, we treat emotion recognition from the perspective of action recognition by exploring the application of deep learning architectures specifically designed for action recognition, for continuous affect recognition. We propose a novel three-stream end-to-end deep learning regression pipeline with an attention mechanism, which is an ensemble design based on sub-modules of multiple state-of-the-art action recognition systems. The pipeline constitutes a novel data pre-processing approach with a spatial self-attention mechanism to extract keyframes. The optical flow of high-attention regions of the face is extracted to capture temporal context. AFEW-VA in-the-wild dataset has been used to conduct comparative experiments. Quantitative analysis shows that the proposed model outperforms multiple standard baselines of both emotion recognition and action recognition models.

CVMar 31, 2025
SmartScan: An AI-based Interactive Framework for Automated Region Extraction from Satellite Images

Savinay Nagendra, Kashif Rashid

The deployment of a continuous methane monitoring system requires determining the optimal number and placement of fixed sensors. However, planning is labor-intensive, requiring extensive site setup and iteration to meet client restrictions. This challenge is amplified when evaluating multiple sites, limiting scalability. To address this, we introduce SmartScan, an AI framework that automates data extraction for optimal sensor placement. SmartScan identifies subspaces of interest from satellite images using an interactive tool to create facility-specific constraint sets efficiently. SmartScan leverages the Segment Anything Model (SAM), a prompt-based transformer for zero-shot segmentation, enabling subspace extraction without explicit training. It operates in two modes: (1) Data Curation Mode, where satellite images are processed to extract high-quality subspaces using an interactive prompting system for SAM, and (2) Autonomous Mode, where user-curated prompts train a deep learning network to replace manual prompting, fully automating subspace extraction. The interactive tool also serves for quality control, allowing users to refine AI-generated outputs and generate additional constraint sets as needed. With its AI-driven prompting mechanism, SmartScan delivers high-throughput, high-quality subspace extraction with minimal human intervention, enhancing scalability and efficiency. Notably, its adaptable design makes it suitable for extracting regions of interest from ultra-high-resolution satellite imagery across various domains.

CVNov 30, 2018
Learning Dynamics from Kinematics: Estimating 2D Foot Pressure Maps from Video Frames

Christopher Funk, Savinay Nagendra, Jesse Scott et al.

Pose stability analysis is the key to understanding locomotion and control of body equilibrium, with applications in numerous fields such as kinesiology, medicine, and robotics. In biomechanics, Center of Pressure (CoP) is used in studies of human postural control and gait. We propose and validate a novel approach to learn CoP from pose of a human body to aid stability analysis. More specifically, we propose an end-to-end deep learning architecture to regress foot pressure heatmaps, and hence the CoP locations, from 2D human pose derived from video. We have collected a set of long (5min +) choreographed Taiji (Tai Chi) sequences of multiple subjects with synchronized foot pressure and video data. The derived human pose data and corresponding foot pressure maps are used jointly in training a convolutional neural network with residual architecture, named PressNET. Cross-subject validation results show promising performance of PressNET, significantly outperforming the baseline method of K-Nearest Neighbors. Furthermore, we demonstrate that our computation of center of pressure (CoP) from PressNET is not only significantly more accurate than those obtained from the baseline approach but also meets the expectations of corresponding lab-based measurements of stability studies in kinesiology.

LGOct 3, 2018
Comparison of Reinforcement Learning algorithms applied to the Cart Pole problem

Savinay Nagendra, Nikhil Podila, Rashmi Ugarakhod et al.

Designing optimal controllers continues to be challenging as systems are becoming complex and are inherently nonlinear. The principal advantage of reinforcement learning (RL) is its ability to learn from the interaction with the environment and provide optimal control strategy. In this paper, RL is explored in the context of control of the benchmark cartpole dynamical system with no prior knowledge of the dynamics. RL algorithms such as temporal-difference, policy gradient actor-critic, and value function approximation are compared in this context with the standard LQR solution. Further, we propose a novel approach to integrate RL and swing-up controllers.