Jayesh Rajkumar Vachhani

CV
h-index4
4papers
9citations
Novelty57%
AI Score38

4 Papers

LGNov 3, 2022
Self Similarity Matrix based CNN Filter Pruning

S Rakshith, Jayesh Rajkumar Vachhani, Sourabh Vasant Gothe et al.

In recent years, most of the deep learning solutions are targeted to be deployed in mobile devices. This makes the need for development of lightweight models all the more imminent. Another solution is to optimize and prune regular deep learning models. In this paper, we tackle the problem of CNN model pruning with the help of Self-Similarity Matrix (SSM) computed from the 2D CNN filters. We propose two novel algorithms to rank and prune redundant filters which contribute similar activation maps to the output. One of the key features of our method is that there is no need of finetuning after training the model. Both the training and pruning process is completed simultaneously. We benchmark our method on two of the most popular CNN models - ResNet and VGG and record their performance on the CIFAR-10 dataset.

CVFeb 15, 2024
What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection

Sourabh Vasant Gothe, Vibhav Agarwal, Sourav Ghosh et al.

Generic Event Boundary Detection (GEBD) task aims to recognize generic, taxonomy-free boundaries that segment a video into meaningful events. Current methods typically involve a neural model trained on a large volume of data, demanding substantial computational power and storage space. We explore two pivotal questions pertaining to GEBD: Can non-parametric algorithms outperform unsupervised neural methods? Does motion information alone suffice for high performance? This inquiry drives us to algorithmically harness motion cues for identifying generic event boundaries in videos. In this work, we propose FlowGEBD, a non-parametric, unsupervised technique for GEBD. Our approach entails two algorithms utilizing optical flow: (i) Pixel Tracking and (ii) Flow Normalization. By conducting thorough experimentation on the challenging Kinetics-GEBD and TAPOS datasets, our results establish FlowGEBD as the new state-of-the-art (SOTA) among unsupervised methods. FlowGEBD exceeds the neural models on the Kinetics-GEBD dataset by obtaining an F1@0.05 score of 0.713 with an absolute gain of 31.7% compared to the unsupervised baseline and achieves an average F1 score of 0.623 on the TAPOS validation dataset.

CVJun 21, 2025
PhysID: Physics-based Interactive Dynamics from a Single-view Image

Sourabh Vasant Gothe, Ayon Chattopadhyay, Gunturi Venkata Sai Phani Kiran et al.

Transforming static images into interactive experiences remains a challenging task in computer vision. Tackling this challenge holds the potential to elevate mobile user experiences, notably through interactive and AR/VR applications. Current approaches aim to achieve this either using pre-recorded video responses or requiring multi-view images as input. In this paper, we present PhysID, that streamlines the creation of physics-based interactive dynamics from a single-view image by leveraging large generative models for 3D mesh generation and physical property prediction. This significantly reduces the expertise required for engineering-intensive tasks like 3D modeling and intrinsic property calibration, enabling the process to be scaled with minimal manual intervention. We integrate an on-device physics-based engine for physically plausible real-time rendering with user interactions. PhysID represents a leap forward in mobile-based interactive dynamics, offering real-time, non-deterministic interactions and user-personalization with efficient on-device memory consumption. Experiments evaluate the zero-shot capabilities of various Multimodal Large Language Models (MLLMs) on diverse tasks and the performance of 3D reconstruction models. These results demonstrate the cohesive functioning of all modules within the end-to-end framework, contributing to its effectiveness.

CVMar 30, 2021
FONTNET: On-Device Font Understanding and Prediction Pipeline

Rakshith S, Rishabh Khurana, Vibhav Agarwal et al.

Fonts are one of the most basic and core design concepts. Numerous use cases can benefit from an in depth understanding of Fonts such as Text Customization which can change text in an image while maintaining the Font attributes like style, color, size. Currently, Text recognition solutions can group recognized text based on line breaks or paragraph breaks, if the Font attributes are known multiple text blocks can be combined based on context in a meaningful manner. In this paper, we propose two engines: Font Detection Engine, which identifies the font style, color and size attributes of text in an image and a Font Prediction Engine, which predicts similar fonts for a query font. Major contributions of this paper are three-fold: First, we developed a novel CNN architecture for identifying font style of text in images. Second, we designed a novel algorithm for predicting similar fonts for a given query font. Third, we have optimized and deployed the entire engine On-Device which ensures privacy and improves latency in real time applications such as instant messaging. We achieve a worst case On-Device inference time of 30ms and a model size of 4.5MB for both the engines.