CVApr 20, 2022Code
NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and ResultsRen Yang, Radu Timofte, Meisong Zheng et al. · tencent-ai
This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and quality enhancement of HEVC compressed video. They require x2 and x4 super-resolution, respectively. The three tracks totally attract more than 600 registrations. In the test phase, 8 teams, 8 teams and 12 teams submitted the final results to Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution and quality enhancement of compressed video. The proposed LDV 2.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge (including open-sourced codes) is at https://github.com/RenYang-home/NTIRE22_VEnh_SR.
AIAug 23, 2022
AI and 6G into the Metaverse: Fundamentals, Challenges and Future Research TrendsMuhammad Zawish, Fayaz Ali Dharejo, Sunder Ali Khowaja et al.
Since Facebook was renamed Meta, a lot of attention, debate, and exploration have intensified about what the Metaverse is, how it works, and the possible ways to exploit it. It is anticipated that Metaverse will be a continuum of rapidly emerging technologies, usecases, capabilities, and experiences that will make it up for the next evolution of the Internet. Several researchers have already surveyed the literature on artificial intelligence (AI) and wireless communications in realizing the Metaverse. However, due to the rapid emergence and continuous evolution of technologies, there is a need for a comprehensive and in-depth survey of the role of AI, 6G, and the nexus of both in realizing the immersive experiences of Metaverse. Therefore, in this survey, we first introduce the background and ongoing progress in augmented reality (AR), virtual reality (VR), mixed reality (MR) and spatial computing, followed by the technical aspects of AI and 6G. Then, we survey the role of AI in the Metaverse by reviewing the state-of-the-art in deep learning, computer vision, and Edge AI to extract the requirements of 6G in Metaverse. Next, we investigate the promising services of B5G/6G towards Metaverse, followed by identifying the role of AI in 6G networks and 6G networks for AI in support of Metaverse applications, and the need for sustainability in Metaverse. Finally, we enlist the existing and potential applications, usecases, and projects to highlight the importance of progress in the Metaverse. Moreover, in order to provide potential research directions to researchers, we underline the challenges, research gaps, and lessons learned identified from the literature review of the aforementioned technologies.
CYApr 13, 2023
ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A ReviewSunder Ali Khowaja, Parus Khuwaja, Kapal Dev et al.
ChatGPT is another large language model (LLM) vastly available for the consumers on their devices but due to its performance and ability to converse effectively, it has gained a huge popularity amongst research as well as industrial community. Recently, many studies have been published to show the effectiveness, efficiency, integration, and sentiments of chatGPT and other LLMs. In contrast, this study focuses on the important aspects that are mostly overlooked, i.e. sustainability, privacy, digital divide, and ethics and suggests that not only chatGPT but every subsequent entry in the category of conversational bots should undergo Sustainability, PrivAcy, Digital divide, and Ethics (SPADE) evaluation. This paper discusses in detail the issues and concerns raised over chatGPT in line with aforementioned characteristics. We also discuss the recent EU AI Act briefly in accordance with the SPADE evaluation. We support our hypothesis by some preliminary data collection and visualizations along with hypothesized facts. We also suggest mitigations and recommendations for each of the concerns. Furthermore, we also suggest some policies and recommendations for EU AI policy act concerning ethics, digital divide, and sustainability
LGJul 4, 2023
SelfFed: Self-Supervised Federated Learning for Data Heterogeneity and Label Scarcity in Medical ImagesSunder Ali Khowaja, Kapal Dev, Syed Muhammad Anwar et al.
Self-supervised learning in the federated learning paradigm has been gaining a lot of interest both in industry and research due to the collaborative learning capability on unlabeled yet isolated data. However, self-supervised based federated learning strategies suffer from performance degradation due to label scarcity and diverse data distributions, i.e., data heterogeneity. In this paper, we propose the SelfFed framework for medical images to overcome data heterogeneity and label scarcity issues. The first phase of the SelfFed framework helps to overcome the data heterogeneity issue by leveraging the pre-training paradigm that performs augmentative modeling using Swin Transformer-based encoder in a decentralized manner. The label scarcity issue is addressed by fine-tuning paradigm that introduces a contrastive network and a novel aggregation strategy. We perform our experimental analysis on publicly available medical imaging datasets to show that SelfFed performs better when compared to existing baselines and works. Our method achieves a maximum improvement of 8.8% and 4.1% on Retina and COVID-FL datasets on non-IID datasets. Further, our proposed method outperforms existing baselines even when trained on a few (10%) labeled instances.
LGNov 21, 2022
SPIN: Simulated Poisoning and Inversion Network for Federated Learning-Based 6G Vehicular NetworksSunder Ali Khowaja, Parus Khuwaja, Kapal Dev et al.
The applications concerning vehicular networks benefit from the vision of beyond 5G and 6G technologies such as ultra-dense network topologies, low latency, and high data rates. Vehicular networks have always faced data privacy preservation concerns, which lead to the advent of distributed learning techniques such as federated learning. Although federated learning has solved data privacy preservation issues to some extent, the technique is quite vulnerable to model inversion and model poisoning attacks. We assume that the design of defense mechanism and attacks are two sides of the same coin. Designing a method to reduce vulnerability requires the attack to be effective and challenging with real-world implications. In this work, we propose simulated poisoning and inversion network (SPIN) that leverages the optimization approach for reconstructing data from a differential model trained by a vehicular node and intercepted when transmitted to roadside unit (RSU). We then train a generative adversarial network (GAN) to improve the generation of data with each passing round and global update from the RSU, accordingly. Evaluation results show the qualitative and quantitative effectiveness of the proposed approach. The attack initiated by SPIN can reduce up to 22% accuracy on publicly available datasets while just using a single attacker. We assume that revealing the simulation of such attacks would help us find its defense mechanism in an effective manner.
NIApr 16, 2022
IIFNet: A Fusion based Intelligent Service for Noisy Preamble Detection in 6GSunder Ali Khowaja, Kapal Dev, Parus Khuwaja et al.
In this article, we present our vision of preamble detection in a physical random access channel for next-generation (Next-G) networks using machine learning techniques. Preamble detection is performed to maintain communication and synchronization between devices of the Internet of Everything (IoE) and next-generation nodes. Considering the scalability and traffic density, Next-G networks have to deal with preambles corrupted by noise due to channel characteristics or environmental constraints. We show that when injecting 15% random noise, the detection performance degrades to 48%. We propose an informative instance-based fusion network (IIFNet) to cope with random noise and to improve detection performance, simultaneously. A novel sampling strategy for selecting informative instances from feature spaces has also been explored to improve detection performance. The proposed IIFNet is tested on a real dataset for preamble detection that was collected with the help of a reputable commercial company.
CVJul 18, 2023
FISTNet: FusIon of STyle-path generative Networks for Facial Style TransferSunder Ali Khowaja, Lewis Nkenyereye, Ghulam Mujtaba et al.
With the surge in emerging technologies such as Metaverse, spatial computing, and generative AI, the application of facial style transfer has gained a lot of interest from researchers as well as startups enthusiasts alike. StyleGAN methods have paved the way for transfer-learning strategies that could reduce the dependency on the huge volume of data that is available for the training process. However, StyleGAN methods have the tendency of overfitting that results in the introduction of artifacts in the facial images. Studies, such as DualStyleGAN, proposed the use of multipath networks but they require the networks to be trained for a specific style rather than generating a fusion of facial styles at once. In this paper, we propose a FusIon of STyles (FIST) network for facial images that leverages pre-trained multipath style transfer networks to eliminate the problem associated with lack of huge data volume in the training phase along with the fusion of multiple styles at the output. We leverage pre-trained styleGAN networks with an external style pass that use residual modulation block instead of a transform coding block. The method also preserves facial structure, identity, and details via the gated mapping unit introduced in this study. The aforementioned components enable us to train the network with very limited amount of data while generating high-quality stylized images. Our training process adapts curriculum learning strategy to perform efficient, flexible style and model fusion in the generative space. We perform extensive experiments to show the superiority of FISTNet in comparison to existing state-of-the-art methods.
CRAug 1, 2024
Pathway to Secure and Trustworthy ZSM for LLMs: Attacks, Defense, and OpportunitiesSunder Ali Khowaja, Parus Khuwaja, Kapal Dev et al.
Recently, large language models (LLMs) have been gaining a lot of interest due to their adaptability and extensibility in emerging applications, including communication networks. It is anticipated that ZSM networks will be able to support LLMs as a service, as they provide ultra reliable low-latency communications and closed loop massive connectivity. However, LLMs are vulnerable to data and model privacy issues that affect the trustworthiness of LLMs to be deployed for user-based services. In this paper, we explore the security vulnerabilities associated with fine-tuning LLMs in ZSM networks, in particular the membership inference attack. We define the characteristics of an attack network that can perform a membership inference attack if the attacker has access to the fine-tuned model for the downstream task. We show that the membership inference attacks are effective for any downstream task, which can lead to a personal data breach when using LLM as a service. The experimental results show that the attack success rate of maximum 92% can be achieved on named entity recognition task. Based on the experimental analysis, we discuss possible defense mechanisms and present possible research directions to make the LLMs more trustworthy in the context of ZSM networks.
CVAug 25, 2022
2nd Place Solutions for UG2+ Challenge 2022 -- D$^{3}$Net for Mitigating Atmospheric Turbulence from ImagesSunder Ali Khowaja, Ik Hyun Lee, Jiseok Yoon
This technical report briefly introduces to the D$^{3}$Net proposed by our team "TUK-IKLAB" for Atmospheric Turbulence Mitigation in $UG2^{+}$ Challenge at CVPR 2022. In the light of test and validation results on textual images to improve text recognition performance and hot-air balloon images for image enhancement, we can say that the proposed method achieves state-of-the-art performance. Furthermore, we also provide a visual comparison with publicly available denoising, deblurring, and frame averaging methods with respect to the proposed work. The proposed method ranked 2nd on the final leader-board of the aforementioned challenge in the testing phase, respectively.
AIFeb 19, 2025
Integration of Agentic AI with 6G Networks for Mission-Critical Applications: Use-case and ChallengesSunder Ali Khowaja, Kapal Dev, Muhammad Salman Pathan et al.
We are in a transformative era, and advances in Artificial Intelligence (AI), especially the foundational models, are constantly in the news. AI has been an integral part of many applications that rely on automation for service delivery, and one of them is mission-critical public safety applications. The problem with AI-oriented mission-critical applications is the humanin-the-loop system and the lack of adaptability to dynamic conditions while maintaining situational awareness. Agentic AI (AAI) has gained a lot of attention recently due to its ability to analyze textual data through a contextual lens while quickly adapting to conditions. In this context, this paper proposes an AAI framework for mission-critical applications. We propose a novel framework with a multi-layer architecture to realize the AAI. We also present a detailed implementation of AAI layer that bridges the gap between network infrastructure and missioncritical applications. Our preliminary analysis shows that the AAI reduces initial response time by 5.6 minutes on average, while alert generation time is reduced by 15.6 seconds on average and resource allocation is improved by up to 13.4%. We also show that the AAI methods improve the number of concurrent operations by 40, which reduces the recovery time by up to 5.2 minutes. Finally, we highlight some of the issues and challenges that need to be considered when implementing AAI frameworks.
NIFeb 3, 2025
Advanced Architectures Integrated with Agentic AI for Next-Generation Wireless NetworksKapal Dev, Sunder Ali Khowaja, Keshav Singh et al.
This paper investigates a range of cutting-edge technologies and architectural innovations aimed at simplifying network operations, reducing operational expenditure (OpEx), and enabling the deployment of new service models. The focus is on (i) Proposing novel, more efficient 6G architectures, with both Control and User planes enabling the seamless expansion of services, while addressing long-term 6G network evolution. (ii) Exploring advanced techniques for constrained artificial intelligence (AI) operations, particularly the design of AI agents for real-time learning, optimizing energy consumption, and the allocation of computational resources. (iii) Identifying technologies and architectures that support the orchestration of backend services using serverless computing models across multiple domains, particularly for vertical industries. (iv) Introducing optically-based, ultra-high-speed, low-latency network architectures, with fast optical switching and real-time control, replacing conventional electronic switching to reduce power consumption by an order of magnitude.
CYFeb 28, 2025
EdgeAIGuard: Agentic LLMs for Minor Protection in Digital SpacesGhulam Mujtaba, Sunder Ali Khowaja, Kapal Dev
Social media has become integral to minors' daily lives and is used for various purposes, such as making friends, exploring shared interests, and engaging in educational activities. However, the increase in screen time has also led to heightened challenges, including cyberbullying, online grooming, and exploitations posed by malicious actors. Traditional content moderation techniques have proven ineffective against exploiters' evolving tactics. To address these growing challenges, we propose the EdgeAIGuard content moderation approach that is designed to protect minors from online grooming and various forms of digital exploitation. The proposed method comprises a multi-agent architecture deployed strategically at the network edge to enable rapid detection with low latency and prevent harmful content targeting minors. The experimental results show the proposed method is significantly more effective than the existing approaches.
CYApr 18, 2025
Framework, Standards, Applications and Best practices of Responsible AI : A Comprehensive SurveyThippa Reddy Gadekallu, Kapal Dev, Sunder Ali Khowaja et al.
Responsible Artificial Intelligence (RAI) is a combination of ethics associated with the usage of artificial intelligence aligned with the common and standard frameworks. This survey paper extensively discusses the global and national standards, applications of RAI, current technology and ongoing projects using RAI, and possible challenges in implementing and designing RAI in the industries and projects based on AI. Currently, ethical standards and implementation of RAI are decoupled which caters each industry to follow their own standards to use AI ethically. Many global firms and government organizations are taking necessary initiatives to design a common and standard framework. Social pressure and unethical way of using AI forces the RAI design rather than implementation.
CVApr 20, 2025
NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and ResultsZheng Chen, Kai Liu, Jue Gong et al.
This paper presents the NTIRE 2025 image super-resolution ($\times$4) challenge, one of the associated competitions of the 10th NTIRE Workshop at CVPR 2025. The challenge aims to recover high-resolution (HR) images from low-resolution (LR) counterparts generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective network designs or solutions that achieve state-of-the-art SR performance. To reflect the dual objectives of image SR research, the challenge includes two sub-tracks: (1) a restoration track, emphasizes pixel-wise accuracy and ranks submissions based on PSNR; (2) a perceptual track, focuses on visual realism and ranks results by a perceptual score. A total of 286 participants registered for the competition, with 25 teams submitting valid entries. This report summarizes the challenge design, datasets, evaluation protocol, the main results, and methods of each team. The challenge serves as a benchmark to advance the state of the art and foster progress in image SR.
CVMar 18, 2024
GT-Rain Single Image Deraining Challenge ReportHoward Zhang, Yunhao Ba, Ethan Yang et al.
This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained on the GT-Rain dataset and evaluated on an extension of the dataset consisting of 15 additional scenes. Scenes in GT-Rain are comprised of real rainy image and ground truth image captured moments after the rain had stopped. 275 participants were registered in the challenge and 55 competed in the final testing phase.
11.1AIApr 2
SEAL: An Open, Auditable, and Fair Data Generation Framework for AI-Native 6G NetworksSunder Ali Khowaja, Kapal Dev, Engin Zeydan et al.
AI-native 6G networks promise to transform the telecom industry by enabling dynamic resource allocation, predictive maintenance, and ultra-reliable low-latency communications across all layers, which are essential for applications such as smart cities, autonomous vehicles, and immersive XR. However, the deployment of 6G systems results in severe data scarcity, hindering the training of efficient AI models. Synthetic data generation is extensively used to fill this gap; however, it introduces challenges related to dataset bias, auditability, and compliance with regulatory frameworks. In this regard, we propose the Synthetic Data Generation with Ethics Audit Loop (SEAL) framework, which extends baseline modular pipelines with an Ethical and Regulatory Compliance by Design (ERCD) module and a Federated Learning (FL) feedback system. The ERCD integrates fairness, bias detection, and standardized audit trails for regulatory mapping, while the FL enables privacy-preserving calibration using aggregated insights from real testbeds to close the reality-simulation gap. Results show that the SEAL framework outperforms existing methods in terms of Frechet Inception Distance, equalized odds, and accuracy. These results validate the framework's ability to generate auditable and bias-mitigated synthetic data for responsible AI-native 6G development.
LGJan 19, 2022
Towards Energy Efficient Distributed Federated Learning for 6G NetworksSunder Ali Khowaja, Kapal Dev, Parus Khuwaja et al.
The provision of communication services via portable and mobile devices, such as aerial base stations, is a crucial concept to be realized in 5G/6G networks. Conventionally, IoT/edge devices need to transmit the data directly to the base station for training the model using machine learning techniques. The data transmission introduces privacy issues that might lead to security concerns and monetary losses. Recently, Federated learning was proposed to partially solve privacy issues via model-sharing with base station. However, the centralized nature of federated learning only allow the devices within the vicinity of base stations to share the trained models. Furthermore, the long-range communication compels the devices to increase transmission power, which raises the energy efficiency concerns. In this work, we propose distributed federated learning (DBFL) framework that overcomes the connectivity and energy efficiency issues for distant devices. The DBFL framework is compatible with mobile edge computing architecture that connects the devices in a distributed manner using clustering protocols. Experimental results show that the framework increases the classification performance by 7.4\% in comparison to conventional federated learning while reducing the energy consumption.
CRJan 12, 2022
Get your Foes Fooled: Proximal Gradient Split Learning for Defense against Model Inversion Attacks on IoMT dataSunder Ali Khowaja, Ik Hyun Lee, Kapal Dev et al.
The past decade has seen a rapid adoption of Artificial Intelligence (AI), specifically the deep learning networks, in Internet of Medical Things (IoMT) ecosystem. However, it has been shown recently that the deep learning networks can be exploited by adversarial attacks that not only make IoMT vulnerable to the data theft but also to the manipulation of medical diagnosis. The existing studies consider adding noise to the raw IoMT data or model parameters which not only reduces the overall performance concerning medical inferences but also is ineffective to the likes of deep leakage from gradients method. In this work, we propose proximal gradient split learning (PSGL) method for defense against the model inversion attacks. The proposed method intentionally attacks the IoMT data when undergoing the deep neural network training process at client side. We propose the use of proximal gradient method to recover gradient maps and a decision-level fusion strategy to improve the recognition performance. Extensive analysis show that the PGSL not only provides effective defense mechanism against the model inversion attacks but also helps in improving the recognition performance on publicly available datasets. We report 14.0$\%$, 17.9$\%$, and 36.9$\%$ gains in accuracy over reconstructed and adversarial attacked images, respectively.
IVOct 22, 2021
Multimodal-Boost: Multimodal Medical Image Super-Resolution using Multi-Attention Network with Wavelet TransformFayaz Ali Dharejo, Muhammad Zawish, Farah Deeba Yuanchun Zhou et al.
Deep learning based single image super resolution (SISR) algorithms has revolutionized the overall diagnosis framework by continually improving the architectural components and training strategies associated with convolutional neural networks (CNN) on low-resolution images. However, existing work lacks in two ways: i) the SR output produced exhibits poor texture details, and often produce blurred edges, ii) most of the models have been developed for a single modality, hence, require modification to adapt to a new one. This work addresses (i) by proposing generative adversarial network (GAN) with deep multi-attention modules to learn high-frequency information from low-frequency data. Existing approaches based on the GAN have yielded good SR results; however, the texture details of their SR output have been experimentally confirmed to be deficient for medical images particularly. The integration of wavelet transform (WT) and GANs in our proposed SR model addresses the aforementioned limitation concerning textons. While the WT divides the LR image into multiple frequency bands, the transferred GAN uses multi-attention and upsample blocks to predict high-frequency components. Additionally, we present a learning method for training domain-specific classifiers as perceptual loss functions. Using a combination of multi-attention GAN loss and a perceptual loss function results in an efficient and reliable performance. Applying the same model for medical images from diverse modalities is challenging, our work addresses (ii) by training and performing on several modalities via transfer learning. Using two medical datasets, we validate our proposed SR network against existing state-of-the-art approaches and achieve promising results in terms of SSIM and PSNR.
CRJul 27, 2021
Towards Industrial Private AI: A two-tier framework for data and model securitySunder Ali Khowaja, Kapal Dev, Nawab Muhammad Faseeh Qureshi et al.
With the advances in 5G and IoT devices, the industries are vastly adopting artificial intelligence (AI) techniques for improving classification and prediction-based services. However, the use of AI also raises concerns regarding privacy and security that can be misused or leaked. Private AI was recently coined to address the data security issue by combining AI with encryption techniques, but existing studies have shown that model inversion attacks can be used to reverse engineer the images from model parameters. In this regard, we propose a Federated Learning and Encryption-based Private (FLEP) AI framework that provides two-tier security for data and model parameters in an IIoT environment. We proposed a three-layer encryption method for data security and provide a hypothetical method to secure the model parameters. Experimental results show that the proposed method achieves better encryption quality at the expense of slightly increased execution time. We also highlight several open issues and challenges regarding the FLEP AI framework's realization.
CYJan 2, 2021
Internet of Everything enabled solution for COVID-19, its new variants and future pandemics: Framework, Challenges, and Research DirectionsSunder Ali Khowaja, Parus Khuwaja, Kapal Dev
After affecting the world in unexpected ways, COVID-19 has started mutating which is evident with the insurgence of its new variants. The governments, hospitals, schools, industries, and humans, in general, are looking for a potential solution in the vaccine which will eventually be available but its timeline for eradicating the virus is yet unknown. Several researchers have encouraged and recommended the use of good practices such as physical healthcare monitoring, immunity-boosting, personal hygiene, mental healthcare, and contact tracing for slowing down the spread of the virus. In this article, we propose the use of wearable/mobile sensors integrated with the Internet of Everything to cover the spectrum of good practices in an automated manner. We present hypothetical frameworks for each of the good practice modules and propose the COvid-19 Resistance Framework using the Internet of Everything (CORFIE) to tie all the individual modules in a unified architecture. We envision that CORFIE would be influential in assisting people with the new normal for current and future pandemics as well as instrumental in halting the economic losses, respectively. We also provide potential challenges and their probable solutions in compliance with the proposed CORFIE.
IVNov 1, 2020
Triage of Potential COVID-19 Patients from Chest X-ray Images using Hierarchical Convolutional NetworksKapal Dev, Sunder Ali Khowaja, Ankur Singh Bist et al.
The current COVID-19 pandemic has motivated the researchers to use artificial intelligence techniques for a potential alternative to reverse transcription-polymerase chain reaction (RT-PCR) due to the limited scale of testing. The chest X-ray (CXR) is one of the alternatives to achieve fast diagnosis but the unavailability of large-scale annotated data makes the clinical implementation of machine learning-based COVID detection difficult. Another issue is the usage of ImageNet pre-trained networks which does not extract reliable feature representations from medical images. In this paper, we propose the use of hierarchical convolutional network (HCN) architecture to naturally augment the data along with diversified features. The HCN uses the first convolution layer from COVIDNet followed by the convolutional layers from well-known pre-trained networks to extract the features. The use of the convolution layer from COVIDNet ensures the extraction of representations relevant to the CXR modality. We also propose the use of ECOC for encoding multiclass problems to binary classification for improving the recognition performance. Experimental results show that HCN architecture is capable of achieving better results in comparison to the existing studies. The proposed method can accurately triage potential COVID-19 patients through CXR images for sharing the testing load and increasing the testing capacity.
CVJan 21, 2019
Semantic Image Networks for Human Action RecognitionSunder Ali Khowaja, Seok-Lyong Lee
In this paper, we propose the use of a semantic image, an improved representation for video analysis, principally in combination with Inception networks. The semantic image is obtained by applying localized sparse segmentation using global clustering (LSSGC) prior to the approximate rank pooling which summarizes the motion characteristics in single or multiple images. It incorporates the background information by overlaying a static background from the window onto the subsequent segmented frames. The idea is to improve the action-motion dynamics by focusing on the region which is important for action recognition and encoding the temporal variances using the frame ranking method. We also propose the sequential combination of Inception-ResNetv2 and long-short-term memory network (LSTM) to leverage the temporal variances for improved recognition performance. Extensive analysis has been carried out on UCF101 and HMDB51 datasets which are widely used in action recognition studies. We show that (i) the semantic image generates better activations and converges faster than its original variant, (ii) using segmentation prior to approximate rank pooling yields better recognition performance, (iii) The use of LSTM leverages the temporal variance information from approximate rank pooling to model the action behavior better than the base network, (iv) the proposed representations can be adaptive as they can be used with existing methods such as temporal segment networks to improve the recognition performance, and (v) our proposed four-stream network architecture comprising of semantic images and semantic optical flows achieves state-of-the-art performance, 95.9% and 73.5% recognition accuracy on UCF101 and HMDB51, respectively.