69.9LGJun 2
PURGE: Projected Unlearning via Retain-Guided ErasureVedant Jawandhia, Daksh Ahuja, Ghufran Alam Siddiqui et al.
We propose PURGE, a machine unlearning algorithm built on a simple but an under-exploited observation: continual learning (CL) and machine unlearning (MU) which are fundamentally dual problems. CL tries to learn new tasks without forgetting old ones; MU tries to erase specific data without hurting retained performance representing the same underlying tension in opposite directions. PURGE leverages this duality by adapting gradient projection from A-GEM (Chaudhry et al., 2019) so that every unlearning step is constrained to not increase the retain-set loss. On top of this, it performs multi-layer representation erasure, pushing forget-set activations in intermediate layers towards the retain distribution to remove information from hidden representations rather than just suppressing it at the output. A key design choice is the retain-confusion target: rather than pushing forget outputs toward the uniform distribution, which we found to be surprisingly easy for membership inference attacks to detect, we instead target the model's natural confusion pattern on retain data. This makes the unlearned model hard to distinguish from one retrained from scratch. Two self-regulating stopping criteria (a retain-loss budget and a forget-accuracy target) let the algorithm decide on its own when to stop, removing the need for manual epoch tuning. In experiments on five datasets (CIFAR-10, MNIST, SVHN, STL10, PathMNIST) across 22 class-level forgetting tasks, PURGE consistently keeps retain accuracy above 96% while achieving MIA AUROC close to 0.5 (the ideal), outperforming gradient ascent, KL-uniform, and several published baselines on the privacy-utility frontier.
LGJul 12, 2022Code
TabSynDex: A Universal Metric for Robust Evaluation of Synthetic Tabular DataVikram S Chundawat, Ayush K Tarun, Murari Mandal et al.
Synthetic tabular data generation becomes crucial when real data is limited, expensive to collect, or simply cannot be used due to privacy concerns. However, producing good quality synthetic data is challenging. Several probabilistic, statistical, generative adversarial networks (GANs), and variational auto-encoder (VAEs) based approaches have been presented for synthetic tabular data generation. Once generated, evaluating the quality of the synthetic data is quite challenging. Some of the traditional metrics have been used in the literature but there is lack of a common, robust, and single metric. This makes it difficult to properly compare the effectiveness of different synthetic tabular data generation methods. In this paper we propose a new universal metric, TabSynDex, for robust evaluation of synthetic data. The proposed metric assesses the similarity of synthetic data with real data through different component scores which evaluate the characteristics that are desirable for ``high quality'' synthetic data. Being a single score metric and having an implicit bound, TabSynDex can also be used to observe and evaluate the training of neural network based approaches. This would help in obtaining insights that was not possible earlier. We present several baseline models for comparative analysis of the proposed evaluation metric with existing generative models. We also give a comparative analysis between TabSynDex and existing synthetic tabular data evaluation metrics. This shows the effectiveness and universality of our metric over the existing metrics. Source Code: \url{https://github.com/vikram2000b/tabsyndex}
CVJan 13, 2023Code
LVRNet: Lightweight Image Restoration for Aerial Images under Low VisibilityEsha Pahwa, Achleshwar Luthra, Pratik Narang
Learning to recover clear images from images having a combination of degrading factors is a challenging task. That being said, autonomous surveillance in low visibility conditions caused by high pollution/smoke, poor air quality index, low light, atmospheric scattering, and haze during a blizzard becomes even more important to prevent accidents. It is thus crucial to form a solution that can result in a high-quality image and is efficient enough to be deployed for everyday use. However, the lack of proper datasets available to tackle this task limits the performance of the previous methods proposed. To this end, we generate the LowVis-AFO dataset, containing 3647 paired dark-hazy and clear images. We also introduce a lightweight deep learning model called Low-Visibility Restoration Network (LVRNet). It outperforms previous image restoration methods with low latency, achieving a PSNR value of 25.744 and an SSIM of 0.905, making our approach scalable and ready for practical use. The code and data can be found at https://github.com/Achleshwar/LVRNet.
CRDec 30, 2025
FAST-IDS: A Fast Two-Stage Intrusion Detection System with Hybrid Compression for Real-Time Threat Detection in Connected and Autonomous VehiclesDevika S, Vishnu Hari, Pratik Narang et al.
We have implemented a multi-stage IDS for CAVs that can be deployed to resourec-constrained environments after hybrid model compression.
CLJan 27
A Hybrid Supervised-LLM Pipeline for Actionable Suggestion Mining in Unstructured Customer ReviewsAakash Trivedi, Aniket Upadhyay, Pratik Narang et al.
Extracting actionable suggestions from customer reviews is essential for operational decision-making, yet these directives are often embedded within mixed-intent, unstructured text. Existing approaches either classify suggestion-bearing sentences or generate high-level summaries, but rarely isolate the precise improvement instructions businesses need. We evaluate a hybrid pipeline combining a high-recall RoBERTa classifier trained with a precision-recall surrogate to reduce unrecoverable false negatives with a controlled, instruction-tuned LLM for suggestion extraction, categorization, clustering, and summarization. Across real-world hospitality and food datasets, the hybrid system outperforms prompt-only, rule-based, and classifier-only baselines in extraction accuracy and cluster coherence. Human evaluations further confirm that the resulting suggestions and summaries are clear, faithful, and interpretable. Overall, our results show that hybrid reasoning architectures achieve meaningful improvements fine-grained actionable suggestion mining while highlighting challenges in domain adaptation and efficient local deployment.
CRDec 30, 2025
FedLiTeCAN : A Federated Lightweight Transformer for Fast and Robust CAN Bus Intrusion DetectionDevika S, Pratik Narang, Tejasvi Alladi
This work implements a lightweight Transformer model for IDS in the domain of Connected and Autonomous Vehicles
CRDec 30, 2025
FedSecureFormer: A Fast, Federated and Secure Transformer Framework for Lightweight Intrusion Detection in Connected and Autonomous VehiclesDevika S, Vishnu Hari, Pratik Narang et al.
This works presents an encoder-only transformer built with minimum layers for intrusion detection in the domain of Connected and Autonomous Vehicles using Federated Learning.
CVSep 30, 2023
InFER: A Multi-Ethnic Indian Facial Expression Recognition DatasetSyed Sameen Ahmad Rizvi, Preyansh Agrawal, Jagat Sesh Challa et al.
The rapid advancement in deep learning over the past decade has transformed Facial Expression Recognition (FER) systems, as newer methods have been proposed that outperform the existing traditional handcrafted techniques. However, such a supervised learning approach requires a sufficiently large training dataset covering all the possible scenarios. And since most people exhibit facial expressions based upon their age group, gender, and ethnicity, a diverse facial expression dataset is needed. This becomes even more crucial while developing a FER system for the Indian subcontinent, which comprises of a diverse multi-ethnic population. In this work, we present InFER, a real-world multi-ethnic Indian Facial Expression Recognition dataset consisting of 10,200 images and 4,200 short videos of seven basic facial expressions. The dataset has posed expressions of 600 human subjects, and spontaneous/acted expressions of 6000 images crowd-sourced from the internet. To the best of our knowledge InFER is the first of its kind consisting of images from 600 subjects from very diverse ethnicity of the Indian Subcontinent. We also present the experimental results of baseline & deep FER methods on our dataset to substantiate its usability in real-world practical applications.
CVAug 4, 2024
LDFaceNet: Latent Diffusion-based Network for High-Fidelity Deepfake GenerationDwij Mehta, Aditya Mehta, Pratik Narang
Over the past decade, there has been tremendous progress in the domain of synthetic media generation. This is mainly due to the powerful methods based on generative adversarial networks (GANs). Very recently, diffusion probabilistic models, which are inspired by non-equilibrium thermodynamics, have taken the spotlight. In the realm of image generation, diffusion models (DMs) have exhibited remarkable proficiency in producing both realistic and heterogeneous imagery through their stochastic sampling procedure. This paper proposes a novel facial swapping module, termed as LDFaceNet (Latent Diffusion based Face Swapping Network), which is based on a guided latent diffusion model that utilizes facial segmentation and facial recognition modules for a conditioned denoising process. The model employs a unique loss function to offer directional guidance to the diffusion process. Notably, LDFaceNet can incorporate supplementary facial guidance for desired outcomes without any retraining. To the best of our knowledge, this represents the first application of the latent diffusion model in the face-swapping task without prior training. The results of this study demonstrate that the proposed method can generate extremely realistic and coherent images by leveraging the potential of the diffusion model for facial swapping, thereby yielding superior visual outcomes and greater diversity.
LGFeb 6
Trust Regions Sell, But Who's Buying? Overlap Geometry as an Alternative Trust Region for Policy OptimizationGaurish Trivedi, Alakh Sharma, Kartikey Singh Bhandari et al.
Standard trust-region methods constrain policy updates via Kullback-Leibler (KL) divergence. However, KL controls only an average divergence and does not directly prevent rare, large likelihood-ratio excursions that destabilize training--precisely the failure mode that motivates heuristics such as PPO's clipping. We propose overlap geometry as an alternative trust region, constraining distributional overlap via the Bhattacharyya coefficient (closely related to the Hellinger/Renyi-1/2 geometry). This objective penalizes separation in the ratio tails, yielding tighter control over likelihood-ratio excursions without relying on total variation bounds that can be loose in tail regimes. We derive Bhattacharyya-TRPO (BTRPO) and Bhattacharyya-PPO (BPPO), enforcing overlap constraints via square-root ratio updates: BPPO clips the square-root ratio q = sqrt(r), and BTRPO applies a quadratic Hellinger/Bhattacharyya penalty. Empirically, overlap-based updates improve robustness and aggregate performance as measured by RLiable under matched training budgets, suggesting overlap constraints as a practical, principled alternative to KL for stable policy optimization.
CVNov 13, 2025
Exposing DeepFakes via Hyperspectral Domain MappingAditya Mehta, Swarnim Chaudhary, Pratik Narang et al.
Modern generative and diffusion models produce highly realistic images that can mislead human perception and even sophisticated automated detection systems. Most detection methods operate in RGB space and thus analyze only three spectral channels. We propose HSI-Detect, a two-stage pipeline that reconstructs a 31-channel hyperspectral image from a standard RGB input and performs detection in the hyperspectral domain. Expanding the input representation into denser spectral bands amplifies manipulation artifacts that are often weak or invisible in the RGB domain, particularly in specific frequency bands. We evaluate HSI-Detect across FaceForensics++ dataset and show the consistent improvements over RGB-only baselines, illustrating the promise of spectral-domain mapping for Deepfake detection.
31.3AIMay 3
Beyond Sentiment: A Multi-Agent Pipeline for Actionable Business Advice from ReviewsKartikey Singh Bhandari, Tanish Jain, Archit Agrawal et al.
Customer reviews contain valuable signals about service quality, but converting large-scale review corpora into actionable business recommendations remains difficult. Standard sentiment/aspect analysis is largely descriptive, while direct prompting of large language models (LLMs) often yields generic and repetitive advice that is weakly grounded in user feedback. We propose a hierarchical decision-support pipeline that explicitly separates signal compression, problem abstraction, candidate generation, objective-based evaluation, and cost-aware routing into different agents. This architectural decomposition produces auditable intermediate artifacts and enables controllable trade-offs between advice quality and token budget. Experiments on Yelp reviews from three service domains show consistent improvements over single-pass LLM baselines across multiple advice quality dimensions, including actionability, relevance, and non-redundancy. A human evaluation further indicates that users generally prefer our system's recommendations. These results highlight the value of structured agentic decomposition for scalable, cost-aware business decision support.
AIDec 19, 2025
Large Language Models as Pokémon Battle Agents: Strategic Play and Content GenerationDaksh Jain, Aarya Jain, Ashutosh Desai et al.
Strategic decision-making in Pokémon battles presents a unique testbed for evaluating large language models. Pokémon battles demand reasoning about type matchups, statistical trade-offs, and risk assessment, skills that mirror human strategic thinking. This work examines whether Large Language Models (LLMs) can serve as competent battle agents, capable of both making tactically sound decisions and generating novel, balanced game content. We developed a turn-based Pokémon battle system where LLMs select moves based on battle state rather than pre-programmed logic. The framework captures essential Pokémon mechanics: type effectiveness multipliers, stat-based damage calculations, and multi-Pokémon team management. Through systematic evaluation across multiple model architectures we measured win rates, decision latency, type-alignment accuracy, and token efficiency. These results suggest LLMs can function as dynamic game opponents without domain-specific training, offering a practical alternative to reinforcement learning for turn-based strategic games. The dual capability of tactical reasoning and content creation, positions LLMs as both players and designers, with implications for procedural generation and adaptive difficulty systems in interactive entertainment.
CVDec 24, 2025
Understanding Virality: A Rubric based Vision-Language Model Framework for Short-Form Edutainment EvaluationArnav Gupta, Gurekas Singh Sahney, Hardik Rathi et al.
Evaluating short-form video content requires moving beyond surface-level quality metrics toward human-aligned, multimodal reasoning. While existing frameworks like VideoScore-2 assess visual and semantic fidelity, they do not capture how specific audiovisual attributes drive real audience engagement. In this work, we propose a data-driven evaluation framework that uses Vision-Language Models (VLMs) to extract unsupervised audiovisual features, clusters them into interpretable factors, and trains a regression-based evaluator to predict engagement on short-form edutainment videos. Our curated YouTube Shorts dataset enables systematic analysis of how VLM-derived features relate to human engagement behavior. Experiments show strong correlations between predicted and actual engagement, demonstrating that our lightweight, feature-based evaluator provides interpretable and scalable assessments compared to traditional metrics (e.g., SSIM, FID). By grounding evaluation in both multimodal feature importance and human-centered engagement signals, our approach advances toward robust and explainable video understanding.
CLDec 18, 2025
Navigating the Reality Gap: Privacy-Preserving On-Device Continual Adaptation of ASR for Clinical TelephonyDarshil Chauhan, Adityasinh Solanki, Vansh Patel et al.
Automatic Speech Recognition (ASR) holds immense potential to assist in clinical documentation and patient report generation, particularly in resource-constrained regions. However, deployment is currently hindered by a technical deadlock: a severe "Reality Gap" between laboratory performance and noisy, real-world clinical audio, coupled with strict privacy and resource constraints. Such adaptation is essential for clinical telephony systems, where patient speech is highly variable and transcription errors can directly impact downstream clinical workflows. We quantify this gap, showing that a robust multilingual model (IndicWav2Vec) degrades up to a 40.94% WER on rural clinical telephony speech from India, rendering it unusable. We demonstrate consistent improvements on these helpline interactions without transmitting raw patient data off-device via an on-device continual adaptation framework using Low-Rank Adaptation (LoRA). We conduct an investigative study of stabilization strategies, characterizing the trade-offs between data-driven and parameter-driven approaches. Our results demonstrate that multi-domain Experience Replay (ER) yields the primary performance gains, achieving a 17.1% relative improvement in target WER and reducing catastrophic forgetting by 55% compared to naive adaptation. Furthermore, we investigate a stabilized importance estimation strategy (Absolute Fisher) to ensure robust convergence against the high-variance gradients common in clinical telephony speech. Finally, we verify via a domain-specific spot check that acoustic adaptation is a fundamental prerequisite for usability in healthcare settings which cannot be bypassed by language models alone.
CLDec 18, 2025
From Facts to Conclusions : Integrating Deductive Reasoning in Retrieval-Augmented LLMsShubham Mishra, Samyek Jain, Gorang Mehrishi et al.
Retrieval-Augmented Generation (RAG) grounds large language models (LLMs) in external evidence, but fails when retrieved sources conflict or contain outdated or subjective information. Prior work address these issues independently but lack unified reasoning supervision. We propose a reasoning-trace-augmented RAG framework that adds structured, interpretable reasoning across three stages : (1) document-level adjudication, (2) conflict analysis, and (3) grounded synthesis, producing citation-linked answers or justified refusals. A Conflict-Aware Trust-Score (CATS) pipeline is introduced which evaluates groundedness, factual correctness, refusal accuracy, and conflict-behavior alignment using an LLM-as-a-Judge. Our 539-query reasoning dataset and evaluation pipeline establish a foundation for conflict-aware, interpretable RAG systems. Experimental results demonstrate substantial gains over baselines, most notably with Qwen, where Supervised Fine-Tuning improved End-to-End answer correctness from 0.069 to 0.883 and behavioral adherence from 0.074 to 0.722.
CVFeb 14, 2025
AffectSRNet : Facial Emotion-Aware Super-Resolution NetworkSyed Sameen Ahmad Rizvi, Soham Kumar, Aryan Seth et al.
Facial expression recognition (FER) systems in low-resolution settings face significant challenges in accurately identifying expressions due to the loss of fine-grained facial details. This limitation is especially problematic for applications like surveillance and mobile communications, where low image resolution is common and can compromise recognition accuracy. Traditional single-image face super-resolution (FSR) techniques, however, often fail to preserve the emotional intent of expressions, introducing distortions that obscure the original affective content. Given the inherently ill-posed nature of single-image super-resolution, a targeted approach is required to balance image quality enhancement with emotion retention. In this paper, we propose AffectSRNet, a novel emotion-aware super-resolution framework that reconstructs high-quality facial images from low-resolution inputs while maintaining the intensity and fidelity of facial expressions. Our method effectively bridges the gap between image resolution and expression accuracy by employing an expression-preserving loss function, specifically tailored for FER applications. Additionally, we introduce a new metric to assess emotion preservation in super-resolved images, providing a more nuanced evaluation of FER system performance in low-resolution scenarios. Experimental results on standard datasets, including CelebA, FFHQ, and Helen, demonstrate that AffectSRNet outperforms existing FSR approaches in both visual quality and emotion fidelity, highlighting its potential for integration into practical FER applications. This work not only improves image clarity but also ensures that emotion-driven applications retain their core functionality in suboptimal resolution environments, paving the way for broader adoption in FER systems.
CVOct 25, 2024
Balancing the Scales: Enhancing Fairness in Facial Expression Recognition with Latent AlignmentSyed Sameen Ahmad Rizvi, Aryan Seth, Pratik Narang
Automatically recognizing emotional intent using facial expression has been a thoroughly investigated topic in the realm of computer vision. Facial Expression Recognition (FER), being a supervised learning task, relies heavily on substantially large data exemplifying various socio-cultural demographic attributes. Over the past decade, several real-world in-the-wild FER datasets that have been proposed were collected through crowd-sourcing or web-scraping. However, most of these practically used datasets employ a manual annotation methodology for labeling emotional intent, which inherently propagates individual demographic biases. Moreover, these datasets also lack an equitable representation of various socio-cultural demographic groups, thereby inducing a class imbalance. Bias analysis and its mitigation have been investigated across multiple domains and problem settings, however, in the FER domain, this is a relatively lesser explored area. This work leverages representation learning based on latent spaces to mitigate bias in facial expression recognition systems, thereby enhancing a deep learning model's fairness and overall accuracy.
CVDec 17, 2025
Lights, Camera, Consistency: A Multistage Pipeline for Character-Stable AI Video StoriesChayan Jain, Rishant Sharma, Archit Garg et al.
Generating long, cohesive video stories with consistent characters is a significant challenge for current text-to-video AI. We introduce a method that approaches video generation in a filmmaker-like manner. Instead of creating a video in one step, our proposed pipeline first uses a large language model to generate a detailed production script. This script guides a text-to-image model in creating consistent visuals for each character, which then serve as anchors for a video generation model to synthesize each scene individually. Our baseline comparisons validate the necessity of this multi-stage decomposition; specifically, we observe that removing the visual anchoring mechanism results in a catastrophic drop in character consistency scores (from 7.99 to 0.55), confirming that visual priors are essential for identity preservation. Furthermore, we analyze cultural disparities in current models, revealing distinct biases in subject consistency and dynamic degree between Indian vs Western-themed generations.
AIOct 18, 2025
ReviewSense: Transforming Customer Review Dynamics into Actionable Business InsightsSiddhartha Krothapalli, Tridib Kumar Das, Praveen Kumar et al.
As customer feedback becomes increasingly central to strategic growth, the ability to derive actionable insights from unstructured reviews is essential. While traditional AI-driven systems excel at predicting user preferences, far less work has focused on transforming customer reviews into prescriptive, business-facing recommendations. This paper introduces ReviewSense, a novel prescriptive decision support framework that leverages advanced large language models (LLMs) to transform customer reviews into targeted, actionable business recommendations. By identifying key trends, recurring issues, and specific concerns within customer sentiments, ReviewSense extends beyond preference-based systems to provide businesses with deeper insights for sustaining growth and enhancing customer loyalty. The novelty of this work lies in integrating clustering, LLM adaptation, and expert-driven evaluation into a unified, business-facing pipeline. Preliminary manual evaluations indicate strong alignment between the model's recommendations and business objectives, highlighting its potential for driving data-informed decision-making. This framework offers a new perspective on AI-driven sentiment analysis, demonstrating its value in refining business strategies and maximizing the impact of customer feedback.
LGSep 27, 2025
Generative Evolutionary Meta-Solver (GEMS): Scalable Surrogate-Free Multi-Agent LearningAlakh Sharma, Gaurish Trivedi, Kartikey Bhandari et al.
Scalable multi-agent reinforcement learning (MARL) remains a central challenge for AI. Existing population-based methods, like Policy-Space Response Oracles, PSRO, require storing explicit policy populations and constructing full payoff matrices, incurring quadratic computation and linear memory costs. We present Generative Evolutionary Meta-Solver (GEMS), a surrogate-free framework that replaces explicit populations with a compact set of latent anchors and a single amortized generator. Instead of exhaustively constructing the payoff matrix, GEMS relies on unbiased Monte Carlo rollouts, multiplicative-weights meta-dynamics, and a model-free empirical-Bernstein UCB oracle to adaptively expand the policy set. Best responses are trained within the generator using an advantage-based trust-region objective, eliminating the need to store and train separate actors. We evaluated GEMS in a variety of Two-player and Multi-Player games such as the Deceptive Messages Game, Kuhn Poker and Multi-Particle environment. We find that GEMS is up to ~6x faster, has 1.3x less memory usage than PSRO, while also reaps higher rewards simultaneously. These results demonstrate that GEMS retains the game theoretic guarantees of PSRO, while overcoming its fundamental inefficiencies, hence enabling scalable multi-agent learning in multiple domains.
LGAug 26, 2025
HAEPO: History-Aggregated Exploratory Policy OptimizationGaurish Trivedi, Alakh Sharma, Kartikey Singh Bhandari et al.
Exploration is essential in modern learning, from reinforcement learning environments with small neural policies to large language models (LLMs). Existing work, such as DPO, leverages full sequence log-likelihoods to capture an entire trajectory of the model's decisions, while methods like GRPO aggregate per-token ratios into a trajectory-level update. However, both often limit exploration on long-horizon tasks. We introduce History-Aggregated Exploratory Policy Optimization (HAEPO), a history-aware exploratory loss to combat these shortcomings. HAEPO compresses each trajectory into the sum of its logarithmic probabilities (a cumulative logarithmic likelihood), and applies a Plackett-Luce softmax across trajectories to obtain normalized weights proportional to their returns, thus encouraging broader exploration. We add entropy regularization to stabilize the aggressive updates to prevent premature collapse and a soft KL penalty relative to a frozen copy of the previous (reference) policy. Empirically, HAEPO converges fast, explores thoroughly, aligns closely with true rewards, and demonstrates robust learning behavior better or at par with PPO, GRPO, and DPO across diverse tasks. Thus, HAEPO provides a stable and interpretable framework by explicitly leveraging full-trajectory history while balancing exploration and stability.
IVFeb 10, 2021
Learning to Enhance Visual Quality via Hyperspectral Domain MappingHarsh Sinha, Aditya Mehta, Murari Mandal et al.
Deep learning based methods have achieved remarkable success in image restoration and enhancement, but most such methods rely on RGB input images. These methods fail to take into account the rich spectral distribution of natural images. We propose a deep architecture, SpecNet, which computes spectral profile to estimate pixel-wise dynamic range adjustment of a given image. First, we employ an unpaired cycle-consistent framework to generate hyperspectral images (HSI) from low-light input images. HSI is further used to generate a normal light image of the same scene. We incorporate a self-supervision and a spectral profile regularization network to infer a plausible HSI from an RGB image. We evaluate the benefits of optimizing the spectral profile for real and fake images in low-light conditions on the LOL Dataset.
CVFeb 10, 2021
Improving Aerial Instance Segmentation in the Dark with Self-Supervised Low Light EnhancementPrateek Garg, Murari Mandal, Pratik Narang
Low light conditions in aerial images adversely affect the performance of several vision based applications. There is a need for methods that can efficiently remove the low light attributes and assist in the performance of key vision tasks. In this work, we propose a new method that is capable of enhancing the low light image in a self-supervised fashion, and sequentially apply detection and segmentation tasks in an end-to-end manner. The proposed method occupies a very small overhead in terms of memory and computational power over the original algorithm and delivers superior results. Additionally, we propose the generation of a new low light aerial dataset using GANs, which can be used to evaluate vision based networks for similar adverse conditions.
CVNov 7, 2020
Domain-Aware Unsupervised Hyperspectral Reconstruction for Aerial Image DehazingAditya Mehta, Harsh Sinha, Murari Mandal et al.
Haze removal in aerial images is a challenging problem due to considerable variation in spatial details and varying contrast. Changes in particulate matter density often lead to degradation in visibility. Therefore, several approaches utilize multi-spectral data as auxiliary information for haze removal. In this paper, we propose SkyGAN for haze removal in aerial images. SkyGAN consists of 1) a domain-aware hazy-to-hyperspectral (H2H) module, and 2) a conditional GAN (cGAN) based multi-cue image-to-image translation module (I2I) for dehazing. The proposed H2H module reconstructs several visual bands from RGB images in an unsupervised manner, which overcomes the lack of hazy hyperspectral aerial image datasets. The module utilizes task supervision and domain adaptation in order to create a "hyperspectral catalyst" for image dehazing. The I2I module uses the hyperspectral catalyst along with a 12-channel multi-cue input and performs effective image dehazing by utilizing the entire visual spectrum. In addition, this work introduces a new dataset, called Hazy Aerial-Image (HAI) dataset, that contains more than 65,000 pairs of hazy and ground truth aerial images with realistic, non-homogeneous haze of varying density. The performance of SkyGAN is evaluated on the recent SateHaze1k dataset as well as the HAI dataset. We also present a comprehensive evaluation of HAI dataset with a representative set of state-of-the-art techniques in terms of PSNR and SSIM.
CVMay 7, 2020
NTIRE 2020 Challenge on NonHomogeneous DehazingCodruta O. Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu et al.
This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of images (restoration of rich details in hazy image). We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor. NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground truth images. The nonhomogeneous haze has been produced using a professional haze generator that imitates the real conditions of haze scenes. 168 participants registered in the challenge and 27 teams competed in the final testing phase. The proposed solutions gauge the state-of-the-art in image dehazing.
NIJul 29, 2013
Real-time Peer-to-Peer Botnet Detection Framework based on Bayesian Regularized Neural NetworkSharath Chandra Guntuku, Pratik Narang, Chittaranjan Hota
Over the past decade, the Cyberspace has seen an increasing number of attacks coming from botnets using the Peer-to-Peer (P2P) architecture. Peer-to-Peer botnets use a decentralized Command & Control architecture. Moreover, a large number of such botnets already exist, and newer versions- which significantly differ from their parent bot- are also discovered practically every year. In this work, the authors propose and implement a novel hybrid framework for detecting P2P botnets in live network traffic by integrating Neural Networks with Bayesian Regularization. Bayesian Regularization helps in achieving better generalization of the dataset, thereby enabling the detection of botnet activity even of those bots which were never used in training the Neural Network. Hence such a framework is suitable for detection of newer and unseen botnets in live traffic of a network. This was verified by testing the Framework on test data unseen to the Detection module (using untrained botnet dataset), and the authors were successful in detecting this activity with an accuracy of 99.2 %.