Debabrata Pal

CV
5papers
34citations
Novelty51%
AI Score39

5 Papers

CVSep 23, 2023
HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot Classification with Unimodal Cues

Ankit Jha, Debabrata Pal, Mainak Singha et al. · deepmind

Recognition of remote sensing (RS) or aerial images is currently of great interest, and advancements in deep learning algorithms added flavor to it in recent years. Occlusion, intra-class variance, lighting, etc., might arise while training neural networks using unimodal RS visual input. Even though joint training of audio-visual modalities improves classification performance in a low-data regime, it has yet to be thoroughly investigated in the RS domain. Here, we aim to solve a novel problem where both the audio and visual modalities are present during the meta-training of a few-shot learning (FSL) classifier; however, one of the modalities might be missing during the meta-testing stage. This problem formulation is pertinent in the RS domain, given the difficulties in data acquisition or sensor malfunctioning. To mitigate, we propose a novel few-shot generative framework, Hallucinated Audio-Visual Embeddings-Network (HAVE-Net), to meta-train cross-modal features from limited unimodal data. Precisely, these hallucinated features are meta-learned from base classes and used for few-shot classification on novel classes during the inference phase. The experimental results on the benchmark ADVANCE and AudioSetZSL datasets show that our hallucinated modality augmentation strategy for few-shot classification outperforms the classifier performance trained with the real multimodal information at least by 0.8-2%.

CVFeb 18, 2023
MultiScale Probability Map guided Index Pooling with Attention-based learning for Road and Building Segmentation

Shirsha Bose, Ritesh Sur Chowdhury, Debabrata Pal et al.

Efficient road and building footprint extraction from satellite images are predominant in many remote sensing applications. However, precise segmentation map extraction is quite challenging due to the diverse building structures camouflaged by trees, similar spectral responses between the roads and buildings, and occlusions by heterogeneous traffic over the roads. Existing convolutional neural network (CNN)-based methods focus on either enriched spatial semantics learning for the building extraction or the fine-grained road topology extraction. The profound semantic information loss due to the traditional pooling mechanisms in CNN generates fragmented and disconnected road maps and poorly segmented boundaries for the densely spaced small buildings in complex surroundings. In this paper, we propose a novel attention-aware segmentation framework, Multi-Scale Supervised Dilated Multiple-Path Attention Network (MSSDMPA-Net), equipped with two new modules Dynamic Attention Map Guided Index Pooling (DAMIP) and Dynamic Attention Map Guided Spatial and Channel Attention (DAMSCA) to precisely extract the building footprints and road maps from remotely sensed images. DAMIP mines the salient features by employing a novel index pooling mechanism to retain important geometric information. On the other hand, DAMSCA simultaneously extracts the multi-scale spatial and spectral features. Besides, using dilated convolution and multi-scale deep supervision in optimizing MSSDMPA-Net helps achieve stellar performance. Experimental results over multiple benchmark building and road extraction datasets, ensures MSSDMPA-Net as the state-of-the-art (SOTA) method for building and road extraction.

26.0CGMay 2
Witness Set: A Visibility Problem in $NP\cap XP$

Satyabrata Jana, Debabrata Pal, Bodhayan Roy et al.

We study the Witness Set problem, a natural dual to the classical Art Gallery problem. In the Witness Set problem, we are given a polygon $P$ and an integer $k$ as input, and the objective is to determine whether $P$ has a witness set of size at least $k$. A point set $X$ in $P$ is called a witness set if every point in $P$ is visible from at most one point in $X$. For simple polygons, we show that Witness Set lies in both $NP$ and $XP$. This stands in sharp contrast to its dual, the Art Gallery problem, which was recently shown to be $\exists \mathbb{R}$-complete by Abrahamsen et al. and is therefore neither in $NP$ nor admits a polynomial-size discretization unless $NP=\exists \mathbb{R}$. In contrast, we prove that Witness Set for simple polygons admits a finite discretization of size $n^{f(k)}$ for some function $f$. For comparison, even for simple polygons, Efrat and Har-Peled gave an algorithm for Art Gallery running in time $n^{O(k)}$ using tools from real algebraic geometry, and it appears difficult to obtain such algorithms without this machinery. On the other hand, our approach for Witness Set is purely combinatorial and relies on discretization, leading to an $n^{f(k)}$-time algorithm. Although Amit et al. claimed more than fifteen years ago that Witness Set is $NP$-hard, no proof or reference was provided. We show that the discrete version of the Witness Set problem - where the witness set must be chosen from a given finite point set $Q$ (instead of allowing witnesses to be chosen anywhere in the polygon), referred to as Discrete Witness Set - is $NP$-complete, even when the input is restricted to rectilinear polygons with holes. However, for simple polygons, Discrete Witness Set admits a polynomial-time algorithm by Das et al. Thus, it remains an open question whether the Witness Set problem is $NP$-hard.

CVSep 22, 2023
Domain Adaptive Few-Shot Open-Set Learning

Debabrata Pal, Deeptej More, Sai Bhargav et al.

Few-shot learning has made impressive strides in addressing the crucial challenges of recognizing unknown samples from novel classes in target query sets and managing visual shifts between domains. However, existing techniques fall short when it comes to identifying target outliers under domain shifts by learning to reject pseudo-outliers from the source domain, resulting in an incomplete solution to both problems. To address these challenges comprehensively, we propose a novel approach called Domain Adaptive Few-Shot Open Set Recognition (DA-FSOS) and introduce a meta-learning-based architecture named DAFOSNET. During training, our model learns a shared and discriminative embedding space while creating a pseudo open-space decision boundary, given a fully-supervised source domain and a label-disjoint few-shot target domain. To enhance data density, we use a pair of conditional adversarial networks with tunable noise variances to augment both domains closed and pseudo-open spaces. Furthermore, we propose a domain-specific batch-normalized class prototypes alignment strategy to align both domains globally while ensuring class-discriminativeness through novel metric objectives. Our training approach ensures that DAFOS-NET can generalize well to new scenarios in the target domain. We present three benchmarks for DA-FSOS based on the Office-Home, mini-ImageNet/CUB, and DomainNet datasets and demonstrate the efficacy of DAFOS-NET through extensive experimentation

CVMay 9, 2024
Vision-Language Modeling with Regularized Spatial Transformer Networks for All Weather Crosswind Landing of Aircraft

Debabrata Pal, Anvita Singh, Saumya Saumya et al.

The intrinsic capability of the Human Vision System (HVS) to perceive depth of field and failure of Instrument Landing Systems (ILS) stimulates a pilot to perform a vision-based manual landing over an autoland approach. However, harsh weather creates challenges, and a pilot must have a clear view of runway elements before the minimum decision altitude. To aid in manual landing, a vision-based system trained to clear weather-induced visual degradations requires a robust landing dataset under various climatic conditions. Nevertheless, to acquire a dataset, flying an aircraft in dangerous weather impacts safety. Also, this system fails to generate reliable warnings, as localization of runway elements suffers from projective distortion while landing at crosswind. To combat, we propose to synthesize harsh weather landing images by training a prompt-based climatic diffusion network. Also, we optimize a weather distillation model using a novel diffusion-distillation loss to learn to clear these visual degradations. Precisely, the distillation model learns an inverse relationship with the diffusion network. Inference time, pre-trained distillation network directly clears weather-impacted onboard camera images, which can be further projected to display devices for improved visibility.Then, to tackle crosswind landing, a novel Regularized Spatial Transformer Networks (RuSTaN) module accurately warps landing images. It minimizes the localization error of runway object detector and helps generate reliable internal software warnings. Finally, we curated an aircraft landing dataset (AIRLAD) by simulating a landing scenario under various weather degradations and experimentally validated our contributions.