Maheswar Bora

CV
h-index29
5papers
3citations
Novelty47%
AI Score44

5 Papers

CVFeb 2Code
DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-time Optical Flow and Stereo Estimation

Tushar Anand, Maheswar Bora, Antitza Dantcheva et al.

In this work, we propose a novel Mamba block DenVisCoM, as well as a novel hybrid architecture specifically tailored for accurate and real-time estimation of optical flow and disparity estimation. Given that such multi-view geometry and motion tasks are fundamentally related, we propose a unified architecture to tackle them jointly. Specifically, the proposed hybrid architecture is based on DenVisCoM and a Transformer-based attention block that efficiently addresses real-time inference, memory footprint, and accuracy at the same time for joint estimation of motion and 3D dense perception tasks. We extensively analyze the benchmark trade-off of accuracy and real-time processing on a large number of datasets. Our experimental results and related analysis suggest that our proposed model can accurately estimate optical flow and disparity estimation in real time. All models and associated code are available at https://github.com/vimstereo/DenVisCoM.

CVDec 21, 2024Code
ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation

Maheswar Bora, Tushar Anand, Saurabh Atreya et al.

In this work we propose a Visual Mamba (ViM) based architecture, to dissolve the existing trade-off for real-time and accurate model with low computation overhead for disparity map generation (DMG). Moreover, we proposed a performance measure that can jointly evaluate the inference speed, computation overhead and the accurateness of a DMG model. The code implementation and corresponding models are available at: https://github.com/MBora/ViM-Disparity.

CVJan 5, 2024
Enhancing 3D-Air Signature by Pen Tip Tail Trajectory Awareness: Dataset and Featuring by Novel Spatio-temporal CNN

Saurabh Atreya, Maheswar Bora, Aritra Mukherjee et al.

This work proposes a novel process of using pen tip and tail 3D trajectory for air signature. To acquire the trajectories we developed a new pen tool and a stereo camera was used. We proposed SliT-CNN, a novel 2D spatial-temporal convolutional neural network (CNN) for better featuring of the air signature. In addition, we also collected an air signature dataset from $45$ signers. Skilled forgery signatures per user are also collected. A detailed benchmarking of the proposed dataset using existing techniques and proposed CNN on existing and proposed dataset exhibit the effectiveness of our methodology.

CVNov 27, 2025
Do You See What I Say? Generalizable Deepfake Detection based on Visual Speech Recognition

Maheswar Bora, Tashvik Dhamija, Shukesh Reddy et al.

Deepfake generation has witnessed remarkable progress, contributing to highly realistic generated images, videos, and audio. While technically intriguing, such progress has raised serious concerns related to the misuse of manipulated media. To mitigate such misuse, robust and reliable deepfake detection is urgently needed. Towards this, we propose a novel network FauxNet, which is based on pre-trained Visual Speech Recognition (VSR) features. By extracting temporal VSR features from videos, we identify and segregate real videos from manipulated ones. The holy grail in this context has to do with zero-shot detection, i.e., generalizable detection, which we focus on in this work. FauxNet consistently outperforms the state-of-the-art in this setting. In addition, FauxNet is able to attribute - distinguish between generation techniques from which the videos stem. Finally, we propose new datasets, referred to as Authentica-Vox and Authentica-HDTF, comprising about 38,000 real and fake videos in total, the latter created with six recent deepfake generation techniques. We provide extensive analysis and results on the Authentica datasets and FaceForensics++, demonstrating the superiority of FauxNet. The Authentica datasets will be made publicly available.

CVNov 19, 2024
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder

Maheswar Bora, Saurabh Atreya, Aritra Mukherjee et al.

In this work, we attempted to extend the thought and showcase a way forward for the Self-supervised Learning (SSL) learning paradigm by combining contrastive learning, self-distillation (knowledge distillation) and masked data modelling, the three major SSL frameworks, to learn a joint and coordinated representation. The proposed technique of SSL learns by the collaborative power of different learning objectives of SSL. Hence to jointly learn the different SSL objectives we proposed a new SSL architecture KDC-MAE, a complementary masking strategy to learn the modular correspondence, and a weighted way to combine them coordinately. Experimental results conclude that the contrastive masking correspondence along with the KD learning objective has lent a hand to performing better learning for multiple modalities over multiple tasks.