Chaitanya Kapoor

CV
h-index3
4papers
8citations
Novelty36%
AI Score33

4 Papers

ROOct 27, 2022
Handwashing Action Detection System for an Autonomous Social Robot

Sreejith Sasidharan, Pranav Prabha, Devasena Pasupuleti et al.

Young children are at an increased risk of contracting contagious diseases such as COVID-19 due to improper hand hygiene. An autonomous social agent that observes children while handwashing and encourages good hand washing practices could provide an opportunity for handwashing behavior to become a habit. In this article, we present a human action recognition system, which is part of the vision system of a social robot platform, to assist children in developing a correct handwashing technique. A modified convolution neural network (CNN) architecture with Channel Spatial Attention Bilinear Pooling (CSAB) frame, with a VGG-16 architecture as the backbone is trained and validated on an augmented dataset. The modified architecture generalizes well with an accuracy of 90% for the WHO-prescribed handwashing steps even in an unseen environment. Our findings indicate that the approach can recognize even subtle hand movements in the video and can be used for gesture detection and classification in social robotics.

LGFeb 22
Partial Soft-Matching Distance for Neural Representational Comparison with Partial Unit Correspondence

Chaitanya Kapoor, Alex H. Williams, Meenakshi Khosla

Representational similarity metrics typically force all units to be matched, making them susceptible to noise and outliers common in neural representations. We extend the soft-matching distance to a partial optimal transport setting that allows some neurons to remain unmatched, yielding rotation-sensitive but robust correspondences. This partial soft-matching distance provides theoretical advantages -- relaxing strict mass conservation while maintaining interpretable transport costs -- and practical benefits through efficient neuron ranking in terms of cross-network alignment without costly iterative recomputation. In simulations, it preserves correct matches under outliers and reliably selects the correct model in noise-corrupted identification tasks. On fMRI data, it automatically excludes low-reliability voxels and produces voxel rankings by alignment quality that closely match computationally expensive brute-force approaches. It achieves higher alignment precision across homologous brain areas than standard soft-matching, which is forced to match all units regardless of quality. In deep networks, highly matched units exhibit similar maximally exciting images, while unmatched units show divergent patterns. This ability to partition by match quality enables focused analyses, e.g., testing whether networks have privileged axes even within their most aligned subpopulations. Overall, partial soft-matching provides a principled and practical method for representational comparison under partial correspondence.

CVMar 22, 2022
Dense Residual Networks for Gaze Mapping on Indian Roads

Chaitanya Kapoor, Kshitij Kumar, Soumya Vishnoi et al.

In the recent past, greater accessibility to powerful computational resources has enabled progress in the field of Deep Learning and Computer Vision to grow by leaps and bounds. This in consequence has lent progress to the domain of Autonomous Driving and Navigation Systems. Most of the present research work has been focused on driving scenarios in the European or American roads. Our paper draws special attention to the Indian driving context. To this effect, we propose a novel architecture, DR-Gaze, which is used to map the driver's gaze onto the road. We compare our results with previous works and state-of-the-art results on the DGAZE dataset. Our code will be made publicly available upon acceptance of our paper.

NCFeb 26, 2025
Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts

Chaitanya Kapoor, Sudhanshu Srivastava, Meenakshi Khosla

Understanding convergent learning -- the degree to which independently trained neural systems -- whether multiple artificial networks or brains and models -- arrive at similar internal representations -- is crucial for both neuroscience and AI. Yet, the literature remains narrow in scope -- typically examining just a handful of models with one dataset, relying on one alignment metric, and evaluating networks at a single post-training checkpoint. We present a large-scale audit of convergent learning, spanning dozens of vision models and thousands of layer-pair comparisons, to close these long-standing gaps. First, we pit three alignment families against one another -- linear regression (affine-invariant), orthogonal Procrustes (rotation-/reflection-invariant), and permutation/soft-matching (unit-order-invariant). We find that orthogonal transformations align representations nearly as effectively as more flexible linear ones, and although permutation scores are lower, they significantly exceed chance, indicating a privileged representational basis. Tracking convergence throughout training further shows that nearly all eventual alignment crystallizes within the first epoch -- well before accuracy plateaus -- indicating it is largely driven by shared input statistics and architectural biases, not by the final task solution. Finally, when models are challenged with a battery of out-of-distribution images, early layers remain tightly aligned, whereas deeper layers diverge in proportion to the distribution shift. These findings fill critical gaps in our understanding of representational convergence, with implications for neuroscience and AI.