Sougata Sen

h-index16

4papers

195citations

Novelty35%

AI Score44

Ranked #45,678 of 194,257 authors (top 24%)#16,043 in CV (top 27%)

4 Papers

4.2CVMay 31

Rank-Aware Quantile Activation for Motion-Robust Crop Segmentation in UAV Imagery

Abinav Kiran, Sravan Danda, Aditya Challa et al.

Motion blur from high-speed UAV acquisition de-grades semantic segmentation on rare texture-dependent classes with high agronomic value. Standard CNNs rely on high-frequency magnitude features that blur destroys, causing statistical erasure of minority signals. We propose Dual Quantile Activation (QAct), a rank-aware block replacing magnitude gating with instance-level rank normalization. Evaluated onAgriculture-Vision 2021 across zero-shot and blur-supervised regimes at multiple severities, QAct is the dominant architectural factor: it delivers consistent mIoU gains over ReLU across both regimes and all severities, with strongest gains on rare structural and texture-dependent classes. Some dominant classes (water,planter skip) show mixed per-class performance under distillation. At moderate blur, zero-shot QAct outperforms distillation-trained ReLU; across all severities, Distill-QAct achieves best performance, confirming rank aware activation and blur-domain training are complementary robustness sources.

14.4CVApr 13, 2025Code

Ges3ViG: Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding

Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju et al.

3-Dimensional Embodied Reference Understanding (3D-ERU) combines a language description and an accompanying pointing gesture to identify the most relevant target object in a 3D scene. Although prior work has explored pure language-based 3D grounding, there has been limited exploration of 3D-ERU, which also incorporates human pointing gestures. To address this gap, we introduce a data augmentation framework-Imputer, and use it to curate a new benchmark dataset-ImputeRefer for 3D-ERU, by incorporating human pointing gestures into existing 3D scene datasets that only contain language instructions. We also propose Ges3ViG, a novel model for 3D-ERU that achieves ~30% improvement in accuracy as compared to other 3D-ERU models and ~9% compared to other purely language-based 3D grounding models. Our code and dataset are available at https://github.com/AtharvMane/Ges3ViG.

17.5LGOct 25, 2021

Applications and Techniques for Fast Machine Learning in Science

Allison McCarn Deiana, Nhan Tran, Joshua Agar et al.

In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.

19.0HCNov 17, 2019

NeckSense: A Multi-Sensor Necklace for Detecting Eating Activities in Free-Living Conditions

Shibo Zhang, Yuqi Zhao, Dzung Tri Nguyen et al.

We present the design, implementation, and evaluation of a multi-sensor low-power necklace 'NeckSense' for automatically and unobtrusively capturing fine-grained information about an individual's eating activity and eating episodes, across an entire waking-day in a naturalistic setting. The NeckSense fuses and classifies the proximity of the necklace from the chin, the ambient light, the Lean Forward Angle, and the energy signals to determine chewing sequences, a building block of the eating activity. It then clusters the identified chewing sequences to determine eating episodes. We tested NeckSense with 11 obese and 9 non-obese participants across two studies, where we collected more than 470 hours of data in naturalistic setting. Our result demonstrates that NeckSense enables reliable eating-detection for an entire waking-day, even in free-living environments. Overall, our system achieves an F1-score of 81.6% in detecting eating episodes in an exploratory study. Moreover, our system can achieve a F1-score of 77.1% for episodes even in an all-day-around free-living setting. With more than 15.8 hours of battery-life NeckSense will allow researchers and dietitians to better understand natural chewing and eating behaviors, and also enable real-time interventions.