IVDec 19, 2022Code
Focal-UNet: UNet-like Focal Modulation for Medical Image SegmentationMohammadReza Naderi, MohammadHossein Givkashi, Fatemeh Piri et al.
Recently, many attempts have been made to construct a transformer base U-shaped architecture, and new methods have been proposed that outperformed CNN-based rivals. However, serious problems such as blockiness and cropped edges in predicted masks remain because of transformers' patch partitioning operations. In this work, we propose a new U-shaped architecture for medical image segmentation with the help of the newly introduced focal modulation mechanism. The proposed architecture has asymmetric depths for the encoder and decoder. Due to the ability of the focal module to aggregate local and global features, our model could simultaneously benefit the wide receptive field of transformers and local viewing of CNNs. This helps the proposed method balance the local and global feature usage to outperform one of the most powerful transformer-based U-shaped models called Swin-UNet. We achieved a 1.68% higher DICE score and a 0.89 better HD metric on the Synapse dataset. Also, with extremely limited data, we had a 4.25% higher DICE score on the NeoPolyp dataset. Our implementations are available at: https://github.com/givkashi/Focal-UNet
IVMar 12, 2023
Endoscopy Classification Model Using Swin Transformer and Saliency MapZahra Sobhaninia, Nasrin Abharian, Nader Karimi et al.
Endoscopy is a valuable tool for the early diagnosis of colon cancer. However, it requires the expertise of endoscopists and is a time-consuming process. In this work, we propose a new multi-label classification method, which considers two aspects of learning approaches (local and global views) for endoscopic image classification. The model consists of a Swin transformer branch and a modified VGG16 model as a CNN branch. To help the learning process of the CNN branch, the model employs saliency maps and endoscopy images and concatenates them. The results demonstrate that this method performed well for endoscopic medical images by utilizing local and global features of the images. Furthermore, quantitative evaluations prove the proposed method's superiority over state-of-the-art works.
CVJun 12, 2023
Supervised Deep Learning for Content-Aware Image Retargeting with Fourier ConvolutionsMohammadHossein Givkashi, MohammadReza Naderi, Nader Karimi et al.
Image retargeting aims to alter the size of the image with attention to the contents. One of the main obstacles to training deep learning models for image retargeting is the need for a vast labeled dataset. Labeled datasets are unavailable for training deep learning models in the image retargeting tasks. As a result, we present a new supervised approach for training deep learning models. We use the original images as ground truth and create inputs for the model by resizing and cropping the original images. A second challenge is generating different image sizes in inference time. However, regular convolutional neural networks cannot generate images of different sizes than the input image. To address this issue, we introduced a new method for supervised learning. In our approach, a mask is generated to show the desired size and location of the object. Then the mask and the input image are fed to the network. Comparing image retargeting methods and our proposed method demonstrates the model's ability to produce high-quality retargeted images. Afterward, we compute the image quality assessment score for each output image based on different techniques and illustrate the effectiveness of our approach.
CVJan 9, 2023
SFI-Swin: Symmetric Face Inpainting with Swin Transformer by Distinctly Learning Face Components DistributionsMohammadReza Naderi, MohammadHossein Givkashi, Nader Karimi et al.
Image inpainting consists of filling holes or missing parts of an image. Inpainting face images with symmetric characteristics is more challenging than inpainting a natural scene. None of the powerful existing models can fill out the missing parts of an image while considering the symmetry and homogeneity of the picture. Moreover, the metrics that assess a repaired face image quality cannot measure the preservation of symmetry between the rebuilt and existing parts of a face. In this paper, we intend to solve the symmetry problem in the face inpainting task by using multiple discriminators that check each face organ's reality separately and a transformer-based network. We also propose "symmetry concentration score" as a new metric for measuring the symmetry of a repaired face image. The quantitative and qualitative results show the superiority of our proposed method compared to some of the recently proposed algorithms in terms of the reality, symmetry, and homogeneity of the inpainted parts.
CVDec 25, 2022
Adaptive Blind Watermarking Using Psychovisual Image FeaturesArezoo PariZanganeh, Ghazaleh Ghorbanzadeh, Zahra Nabizadeh ShahreBabak et al.
With the growth of editing and sharing images through the internet, the importance of protecting the images' authorship has increased. Robust watermarking is a known approach to maintaining copyright protection. Robustness and imperceptibility are two factors that are tried to be maximized through watermarking. Usually, there is a trade-off between these two parameters. Increasing the robustness would lessen the imperceptibility of the watermarking. This paper proposes an adaptive method that determines the strength of the watermark embedding in different parts of the cover image regarding its texture and brightness. Adaptive embedding increases the robustness while preserving the quality of the watermarked image. Experimental results also show that the proposed method can effectively reconstruct the embedded payload in different kinds of common watermarking attacks. Our proposed method has shown good performance compared to a recent technique.
CVSep 11, 2022
OAIR: Object-Aware Image Retargeting Using PSO and Aesthetic Quality AssessmentMohammad Reza Naderi, Mohammad Hossein Givkashi, Nader Karimi et al.
Image retargeting aims at altering an image size while preserving important content and minimizing noticeable distortions. However, previous image retargeting methods create outputs that suffer from artifacts and distortions. Besides, most previous works attempt to retarget the background and foreground of the input image simultaneously. Simultaneous resizing of the foreground and background causes changes in the aspect ratios of the objects. The change in the aspect ratio is specifically not desirable for human objects. We propose a retargeting method that overcomes these problems. The proposed approach consists of the following steps. Firstly, an inpainting method uses the input image and the binary mask of foreground objects to produce a background image without any foreground objects. Secondly, the seam carving method resizes the background image to the target size. Then, a super-resolution method increases the input image quality, and we then extract the foreground objects. Finally, the retargeted background and the extracted super-resolued objects are fed into a particle swarm optimization algorithm (PSO). The PSO algorithm uses aesthetic quality assessment as its objective function to identify the best location and size for the objects to be placed in the background. We used image quality assessment and aesthetic quality assessment measures to show our superior results compared to popular image retargeting techniques.
IVFeb 17, 2023
Low Latency Video Denoising for Online Conferencing Using CNN ArchitecturesAltanai Bisht, Ana Carolina de Souza Mendes, Justin David Thoreson et al.
In this paper, we propose a pipeline for real-time video denoising with low runtime cost and high perceptual quality. The vast majority of denoising studies focus on image denoising. However, a minority of research works focusing on video denoising do so with higher performance costs to obtain higher quality while maintaining temporal coherence. The approach we introduce in this paper leverages the advantages of both image and video-denoising architectures. Our pipeline first denoises the keyframes or one-fifth of the frames using HI-GAN blind image denoising architecture. Then, the remaining four-fifths of the noisy frames and the denoised keyframe data are fed into the FastDVDnet video denoising model. The final output is rendered in the user's display in real-time. The combination of these low-latency neural network architectures produces real-time denoising with high perceptual quality with applications in video conferencing and other real-time media streaming systems. A custom noise detector analyzer provides real-time feedback to adapt the weights and improve the models' output.
CVNov 15, 2022
Dynamic-Pix2Pix: Noise Injected cGAN for Modeling Input and Target Domain Joint Distributions with Limited Training DataMohammadreza Naderi, Nader Karimi, Ali Emami et al.
Learning to translate images from a source to a target domain with applications such as converting simple line drawing to oil painting has attracted significant attention. The quality of translated images is directly related to two crucial issues. First, the consistency of the output distribution with that of the target is essential. Second, the generated output should have a high correlation with the input. Conditional Generative Adversarial Networks, cGANs, are the most common models for translating images. The performance of a cGAN drops when we use a limited training dataset. In this work, we increase the Pix2Pix (a form of cGAN) target distribution modeling ability with the help of dynamic neural network theory. Our model has two learning cycles. The model learns the correlation between input and ground truth in the first cycle. Then, the model's architecture is refined in the second cycle to learn the target distribution from noise input. These processes are executed in each iteration of the training procedure. Helping the cGAN learn the target distribution from noise input results in a better model generalization during the test time and allows the model to fit almost perfectly to the target domain distribution. As a result, our model surpasses the Pix2Pix model in segmenting HC18 and Montgomery's chest x-ray images. Both qualitative and Dice scores show the superiority of our model. Although our proposed method does not use thousand of additional data for pretraining, it produces comparable results for the in and out-domain generalization compared to the state-of-the-art methods.
CYMay 16, 2022
Does Crypto Kill? Relationship between Electricity Consumption Carbon Footprints and Bitcoin TransactionsAltanai Bisht, Arielle Wilson, Zachary Jeffreys et al.
Cryptocurrencies are gaining more popularity due to their security, making counterfeits impossible. However, these digital currencies have been criticized for creating a large carbon footprint due to their algorithmic complexity and decentralized system design for proof of work and mining. We hypothesize that the carbon footprint of cryptocurrency transactions has a higher dependency on carbon-rich fuel sources than green or renewable fuel sources. We provide a machine learning framework to model such transactions and correlate them with the electricity generation patterns to estimate and analyze their carbon cost.
LGNov 26, 2025
RAVQ-HoloNet: Rate-Adaptive Vector-Quantized Hologram CompressionShima Rafiei, Zahra Nabizadeh Shahr Babak, Shadrokh Samavi et al.
Holography offers significant potential for AR/VR applications, yet its adoption is limited by the high demands of data compression. Existing deep learning approaches generally lack rate adaptivity within a single network. We present RAVQ-HoloNet, a rate-adaptive vector quantization framework that achieves high-fidelity reconstructions at low and ultra-low bit rates, outperforming current state-of-the-art methods. In low bit, our method exceeds by -33.91% in BD-Rate and achieves a BD-PSNR of 1.02 dB from the best existing method demonstrated by the rate-distortion curve.
SYOct 12, 2017
Power Aware Visual Sensor Network for Wildlife Habitat MonitoringMohsen Hooshmand, Shadrokh Samavi, S. M. Reza Soroushmehr
One of the fundamental issue in wireless sensor network is conserving energy and thus extending the lifetime of the network. In this paper we investigate the coverage problem in camera sensor networks by developing two algorithms which consider network lifetime. Also, it is assumed that camera sensors spread randomly over a large area in order to monitor a designated air space. To increase the lifetime of the network, the density of distributed sensors could be such that a subset of sensors can cover the required air space. As a sensor dies another sensor should be selected to compensate for the dead one and reestablish the complete coverage. This process should be continued until complete coverage is not achievable by the existing sensors. Thereafter, a graceful degradation of the coverage is desirable. The goal is to elongate the lifetime of the network while maintaining a maximum possible coverage of the designated air space. Since the selection of a subset of sensors for complete coverage of the target area is an NP-complete problem we present a class of heuristics for this case. This is done by prioritizing the sensors based on their visual and communicative properties.
IVMar 18, 2023
Lossless Microarray Image Compression by Hardware Array CompactorAnahita Banaei, Shadrokh Samavi, Ebrahim Nasr Esfahani
Microarray technology is a new and powerful tool for the concurrent monitoring of a large number of gene expressions. Each microarray experiment produces hundreds of images. Each digital image requires a large storage space. Hence, real-time processing of these images and transmission of them necessitates efficient and custom-made lossless compression schemes. In this paper, we offer a new architecture for the lossless compression of microarray images. In this architecture, we have used dedicated hardware for the separation of foreground pixels from background ones. By separating these pixels and using pipeline architecture, a higher lossless compression ratio has been achieved as compared to other existing methods.
CVApr 9, 2022
Adaptive search area for fast motion estimationS. M. Reza Soroushmehr, Shadrokh Samavi, Shahram Shirani
This paper suggests a new method for determining the search area for a motion estimation algorithm based on block matching. The search area is adaptively found in the proposed method for each frame block. This search area is similar to that of the full search (FS) algorithm but smaller for most blocks of a frame. Therefore, the proposed algorithm is analogous to FS in terms of regularity but has much less computational complexity. The temporal and spatial correlations among the motion vectors of blocks are used to find the search area. The matched block is chosen from a rectangular area that the prediction vectors set out. Simulation results indicate that the speed of the proposed algorithm is at least seven times better than the FS algorithm.
NAOct 12, 2017
Reduction of Look Up Tables for Computation of Reciprocal of Square RootsShadrokh Samavi, Mohammad Reza Jahangir
Among many existing algorithms, convergence methods are the most popular means of computing square root and the reciprocal of square root of numbers. An initial approximation is required in these methods. Look up tables (LUT) are employed to produce the initial approximation. In this paper a number of methods are suggested to reduce the size of the look up tables. The precision of the initial approximation plays an important role in the quality of the final result. There are constraints for the use of a LUT in terms of its size and its access time. Therefore, the optimization of the LUTs must be done in a way to minimize hardware while offering acceptable convergence speed and exactitude.
IVMar 17, 2024
A lightweight deep learning pipeline with DRDA-Net and MobileNet for breast cancer classificationMahdie Ahmadi, Nader Karimi, Shadrokh Samavi
Accurate and early detection of breast cancer is essential for successful treatment. This paper introduces a novel deep-learning approach for improved breast cancer classification in histopathological images, a crucial step in diagnosis. Our method hinges on the Dense Residual Dual-Shuffle Attention Network (DRDA-Net), inspired by ShuffleNet's efficient architecture. DRDA-Net achieves exceptional accuracy across various magnification levels on the BreaKHis dataset, a breast cancer histopathology analysis benchmark. However, for real-world deployment, computational efficiency is paramount. We integrate a pre-trained MobileNet model renowned for its lightweight design to address computational. MobileNet ensures fast execution even on devices with limited resources without sacrificing performance. This combined approach offers a promising solution for accurate breast cancer diagnosis, paving the way for faster and more accessible screening procedures.
MMDec 8, 2023
High-Quality Live Video Streaming via Transcoding Time Prediction and Preset SelectionZahra Nabizadeh Shahre-Babak, Nader Karimi, Krishna Rapaka et al.
Video streaming often requires transcoding content into different resolutions and bitrates to match the recipient's internet speed and screen capabilities. Video encoders like x264 offer various presets, each with different tradeoffs between transcoding time and rate-distortion performance. Choosing the best preset for video transcoding is difficult, especially for live streaming, as trying all the presets and choosing the best one is not feasible. One solution is to predict each preset's transcoding time and select the preset that ensures the highest quality while adhering to live streaming time constraints. Prediction of video transcoding time is also critical in minimizing streaming delays, deploying resource management algorithms, and load balancing. We propose a learning-based framework for predicting the transcoding time of videos across various presets. Our predictor's features for video transcoding time prediction are derived directly from the ingested stream, primarily from the header or metadata. As a result, only minimal additional delay is incurred for feature extraction, rendering our approach ideal for live-streaming applications. We evaluated our learning-based transcoding time prediction using a dataset of videos. The results demonstrate that our framework can accurately predict the transcoding time for different presets, with a mean absolute percentage error (MAPE) of nearly 5.0%. Leveraging these predictions, we then select the most suitable transcoding preset for live video streaming. Utilizing our transcoding time prediction-based preset selection improved Peak Signal-to-Noise Ratio (PSNR) of up to 5 dB.
IVNov 22, 2024
BrightVAE: Luminosity Enhancement in Underexposed Endoscopic ImagesFarzaneh Koohestani, Zahra Nabizadeh, Nader Karimi et al.
The enhancement of image luminosity is especially critical in endoscopic images. Underexposed endoscopic images often suffer from reduced contrast and uneven brightness, significantly impacting diagnostic accuracy and treatment planning. Internal body imaging is challenging due to uneven lighting and shadowy regions. Enhancing such images is essential since precise image interpretation is crucial for patient outcomes. In this paper, we introduce BrightVAE, an architecture based on the hierarchical Vector Quantized Variational Autoencoder (hierarchical VQ-VAE) tailored explicitly for enhancing luminosity in low-light endoscopic images. Our architecture is meticulously designed to tackle the unique challenges inherent in endoscopic imaging, such as significant variations in illumination and obscured details due to poor lighting conditions. The proposed model emphasizes advanced feature extraction from three distinct viewpoints-incorporating various receptive fields, skip connections, and feature attentions to robustly enhance image quality and support more accurate medical diagnoses. Through rigorous experimental analysis, we demonstrate the effectiveness of these techniques in enhancing low-light endoscopic images. To evaluate the performance of our architecture, we employ three widely recognized metrics-SSIM, PSNR, and LPIPS-specifically on Endo4IE dataset, which consists of endoscopic images. We evaluated our method using the Endo4IE dataset, which consists exclusively of endoscopic images, and showed significant advancements over the state-of-the-art methods for enhancing luminosity in endoscopic imaging.
MMApr 13, 2024
A Parametric Rate-Distortion Model for Video TranscodingMaedeh Jamali, Nader Karimi, Shadrokh Samavi et al.
Over the past two decades, the surge in video streaming applications has been fueled by the increasing accessibility of the internet and the growing demand for network video. As users with varying internet speeds and devices seek high-quality video, transcoding becomes essential for service providers. In this paper, we introduce a parametric rate-distortion (R-D) transcoding model. Our model excels at predicting transcoding distortion at various rates without the need for encoding the video. This model serves as a versatile tool that can be used to achieve visual quality improvement (in terms of PSNR) via trans-sizing. Moreover, we use our model to identify visually lossless and near-zero-slope bitrate ranges for an ingest video. Having this information allows us to adjust the transcoding target bitrate while introducing visually negligible quality degradations. By utilizing our model in this manner, quality improvements up to 2 dB and bitrate savings of up to 46% of the original target bitrate are possible. Experimental results demonstrate the efficacy of our model in video transcoding rate distortion prediction.
CVDec 23, 2023
Revealing Shadows: Low-Light Image Enhancement Using Self-Calibrated IlluminationFarzaneh Koohestani, Nader Karimi, Shadrokh Samavi
In digital imaging, enhancing visual content in poorly lit environments is a significant challenge, as images often suffer from inadequate brightness, hidden details, and an overall reduction in quality. This issue is especially critical in applications like nighttime surveillance, astrophotography, and low-light videography, where clear and detailed visual information is crucial. Our research addresses this problem by enhancing the illumination aspect of dark images. We have advanced past techniques by using varied color spaces to extract the illumination component, enhance it, and then recombine it with the other components of the image. By employing the Self-Calibrated Illumination (SCI) method, a strategy initially developed for RGB images, we effectively intensify and clarify details that are typically lost in low-light conditions. This method of selective illumination enhancement leaves the color information intact, thus preserving the color integrity of the image. Crucially, our method eliminates the need for paired images, making it suitable for situations where they are unavailable. Implementing the modified SCI technique represents a substantial shift from traditional methods, providing a refined and potent solution for low-light image enhancement. Our approach sets the stage for more complex image processing techniques and extends the range of possible real-world applications where accurate color representation and improved visibility are essential.
CVFeb 21, 2022
DGAFF: Deep Genetic Algorithm Fitness Formation for EEG Bio-Signal Channel SelectionGhazaleh Ghorbanzadeh, Zahra Nabizadeh, Nader Karimi et al.
Brain-computer interface systems aim to facilitate human-computer interactions in a great deal by direct translation of brain signals for computers. Recently, using many electrodes has caused better performance in these systems. However, increasing the number of recorded electrodes leads to additional time, hardware, and computational costs besides undesired complications of the recording process. Channel selection has been utilized to decrease data dimension and eliminate irrelevant channels while reducing the noise effects. Furthermore, the technique lowers the time and computational costs in real-time applications. We present a channel selection method, which combines a sequential search method with a genetic algorithm called Deep GA Fitness Formation (DGAFF). The proposed method accelerates the convergence of the genetic algorithm and increases the system's performance. The system evaluation is based on a lightweight deep neural network that automates the whole model training process. The proposed method outperforms other channel selection methods in classifying motor imagery on the utilized dataset.
IVDec 28, 2021
Brain Tumor Classification by Cascaded Multiscale Multitask Learning Framework Based on Feature AggregationZahra Sobhaninia, Nader Karimi, Pejman Khadivi et al.
Brain tumor analysis in MRI images is a significant and challenging issue because misdiagnosis can lead to death. Diagnosis and evaluation of brain tumors in the early stages increase the probability of successful treatment. However, the complexity and variety of tumors, shapes, and locations make their segmentation and classification complex. In this regard, numerous researchers have proposed brain tumor segmentation and classification methods. This paper presents an approach that simultaneously segments and classifies brain tumors in MRI images using a framework that contains MRI image enhancement and tumor region detection. Eventually, a network based on a multitask learning approach is proposed. Subjective and objective results indicate that the segmentation and classification results based on evaluation metrics are better or comparable to the state-of-the-art.
CVDec 17, 2021
Image Inpainting Using AutoEncoder and Guided Selection of Predicted PixelsMohammad H. Givkashi, Mahshid Hadipour, Arezoo PariZanganeh et al.
Image inpainting is an effective method to enhance distorted digital images. Different inpainting methods use the information of neighboring pixels to predict the value of missing pixels. Recently deep neural networks have been used to learn structural and semantic details of images for inpainting purposes. In this paper, we propose a network for image inpainting. This network, similar to U-Net, extracts various features from images, leading to better results. We improved the final results by replacing the damaged pixels with the recovered pixels of the output images. Our experimental results show that this method produces high-quality results compare to the traditional methods.
IVDec 7, 2021
Nuclei Segmentation in Histopathology Images using Deep Learning with Local and Global ViewsMahdi Arab Loodaricheh, Nader Karimi, Shadrokh Samavi
Digital pathology is one of the most significant developments in modern medicine. Pathological examinations are the gold standard of medical protocols and play a fundamental role in diagnosis. Recently, with the advent of digital scanners, tissue histopathology slides can now be digitized and stored as digital images. As a result, digitized histopathological tissues can be used in computer-aided image analysis programs and machine learning techniques. Detection and segmentation of nuclei are some of the essential steps in the diagnosis of cancers. Recently, deep learning has been used for nuclei segmentation. However, one of the problems in deep learning methods for nuclei segmentation is the lack of information from out of the patches. This paper proposes a deep learning-based approach for nuclei segmentation, which addresses the problem of misprediction in patch border areas. We use both local and global patches to predict the final segmentation map. Experimental results on the Multi-organ histopathology dataset demonstrate that our method outperforms the baseline nuclei segmentation and popular segmentation models.
CVSep 12, 2021
MSGDD-cGAN: Multi-Scale Gradients Dual Discriminator Conditional Generative Adversarial NetworkMohammadreza Naderi, Zahra Nabizadeh, Nader Karimi et al.
Conditional Generative Adversarial Networks (cGANs) have been used in many image processing tasks. However, they still have serious problems maintaining the balance between conditioning the output on the input and creating the output with the desired distribution based on the corresponding ground truth. The traditional cGANs, similar to most conventional GANs, suffer from vanishing gradients, which backpropagate from the discriminator to the generator. Moreover, the traditional cGANs are sensitive to architectural changes due to previously mentioned gradient problems. Therefore, balancing the architecture of the cGANs is almost impossible. Recently MSG-GAN has been proposed to stabilize the performance of the GANs by applying multiple connections between the generator and discriminator. In this work, we propose a method called MSGDD-cGAN, which first stabilizes the performance of the cGANs using multi-connections gradients flow. Secondly, the proposed network architecture balances the correlation of the output to input and the fitness of the output on the target distribution. This balance is generated by using the proposed dual discrimination procedure. We tested our model by segmentation of fetal ultrasound images. Our model shows a 3.18% increase in the F1 score comparing to the pix2pix version of cGANs.
IVAug 19, 2021
Segmentation of Lungs COVID Infected Regions by Attention Mechanism and Synthetic DataParham Yazdekhasty, Ali Zindari, Zahra Nabizadeh-ShahreBabak et al.
Coronavirus has caused hundreds of thousands of deaths. Fatalities could decrease if every patient could get suitable treatment by the healthcare system. Machine learning, especially computer vision methods based on deep learning, can help healthcare professionals diagnose and treat COVID-19 infected cases more efficiently. Hence, infected patients can get better service from the healthcare system and decrease the number of deaths caused by the coronavirus. This research proposes a method for segmenting infected lung regions in a CT image. For this purpose, a convolutional neural network with an attention mechanism is used to detect infected areas with complex patterns. Attention blocks improve the segmentation accuracy by focusing on informative parts of the image. Furthermore, a generative adversarial network generates synthetic images for data augmentation and expansion of small available datasets. Experimental results show the superiority of the proposed method compared to some existing procedures.
CVJun 16, 2021
Compound Frechet Inception Distance for Quality Assessment of GAN Created ImagesEric J. Nunn, Pejman Khadivi, Shadrokh Samavi
Generative adversarial networks or GANs are a type of generative modeling framework. GANs involve a pair of neural networks engaged in a competition in iteratively creating fake data, indistinguishable from the real data. One notable application of GANs is developing fake human faces, also known as "deep fakes," due to the deep learning algorithms at the core of the GAN framework. Measuring the quality of the generated images is inherently subjective but attempts to objectify quality using standardized metrics have been made. One example of objective metrics is the Frechet Inception Distance (FID), which measures the difference between distributions of feature vectors for two separate datasets of images. There are situations that images with low perceptual qualities are not assigned appropriate FID scores. We propose to improve the robustness of the evaluation process by integrating lower-level features to cover a wider array of visual defects. Our proposed method integrates three levels of feature abstractions to evaluate the quality of generated images. Experimental evaluations show better performance of the proposed method for distorted images.
MMMay 24, 2021
Robust Watermarking using Diffusion of Logo into Autoencoder Feature MapsMaedeh Jamali, Nader Karim, Pejman Khadivi et al.
Digital contents have grown dramatically in recent years, leading to increased attention to copyright. Image watermarking has been considered one of the most popular methods for copyright protection. With the recent advancements in applying deep neural networks in image processing, these networks have also been used in image watermarking. Robustness and imperceptibility are two challenging features of watermarking methods that the trade-off between them should be satisfied. In this paper, we propose to use an end-to-end network for watermarking. We use a convolutional neural network (CNN) to control the embedding strength based on the image content. Dynamic embedding helps the network to have the lowest effect on the visual quality of the watermarked image. Different image processing attacks are simulated as a network layer to improve the robustness of the model. Our method is a blind watermarking approach that replicates the watermark string to create a matrix of the same size as the input image. Instead of diffusing the watermark data into the input image, we inject the data into the feature space and force the network to do this in regions that increase the robustness against various attacks. Experimental results show the superiority of the proposed method in terms of imperceptibility and robustness compared to the state-of-the-art algorithms.
IVJan 21, 2021
Weighted Fuzzy-Based PSNR for WatermarkingMaedeh Jamali, Nader Karimi, Shadrokh Samavi
One of the problems of conventional visual quality evaluation criteria such as PSNR and MSE is the lack of appropriate standards based on the human visual system (HVS). They are calculated based on the difference of the corresponding pixels in the original and manipulated image. Hence, they practically do not provide a correct understanding of the image quality. Watermarking is an image processing application in which the image's visual quality is an essential criterion for its evaluation. Watermarking requires a criterion based on the HVS that provides more accurate values than conventional measures such as PSNR. This paper proposes a weighted fuzzy-based criterion that tries to find essential parts of an image based on the HVS. Then these parts will have larger weights in computing the final value of PSNR. We compare our results against standard PSNR, and our experiments show considerable consequences.
IVNov 1, 2020
Bifurcated Autoencoder for Segmentation of COVID-19 Infected Regions in CT ImagesParham Yazdekhasty, Ali Zindar, Zahra Nabizadeh-ShahreBabak et al.
The new coronavirus infection has shocked the world since early 2020 with its aggressive outbreak. Rapid detection of the disease saves lives, and relying on medical imaging (Computed Tomography and X-ray) to detect infected lungs has shown to be effective. Deep learning and convolutional neural networks have been used for image analysis in this context. However, accurate identification of infected regions has proven challenging for two main reasons. Firstly, the characteristics of infected areas differ in different images. Secondly, insufficient training data makes it challenging to train various machine learning algorithms, including deep-learning models. This paper proposes an approach to segment lung regions infected by COVID-19 to help cardiologists diagnose the disease more accurately, faster, and more manageable. We propose a bifurcated 2-D model for two types of segmentation. This model uses a shared encoder and a bifurcated connection to two separate decoders. One decoder is for segmentation of the healthy region of the lungs, while the other is for the segmentation of the infected regions. Experiments on publically available images show that the bifurcated structure segments infected regions of the lungs better than state of the art.
IVNov 1, 2020
Brain Tumor Classification Using Medial Residual Encoder LayersZahra SobhaniNia, Nader Karimi, Pejman Khadivi et al.
According to the World Health Organization (WHO), cancer is the second leading cause of death worldwide, responsible for over 9.5 million deaths in 2018 alone. Brain tumors count for one out of every four cancer deaths. Therefore, accurate and timely diagnosis of brain tumors will lead to more effective treatments. Physicians classify brain tumors only with biopsy operation by brain surgery, and after diagnosing the type of tumor, a treatment plan is considered for the patient. Automatic systems based on machine learning algorithms can allow physicians to diagnose brain tumors with noninvasive measures. To date, several image classification approaches have been proposed to aid diagnosis and treatment. For brain tumor classification in this work, we offer a system based on deep learning, containing encoder blocks. These blocks are fed with post-max-pooling features as residual learning. Our approach shows promising results by improving the tumor classification accuracy in Magnetic resonance imaging (MRI) images using a limited medical image dataset. Experimental evaluations of this model on a dataset consisting of 3064 MR images show 95.98% accuracy, which is better than previous studies on this database.
IVSep 1, 2020
Classification of Diabetic Retinopathy Using Unlabeled Data and Knowledge DistillationSajjad Abbasi, Mohsen Hajabdollahi, Pejman Khadivi et al.
Knowledge distillation allows transferring knowledge from a pre-trained model to another. However, it suffers from limitations, and constraints related to the two models need to be architecturally similar. Knowledge distillation addresses some of the shortcomings associated with transfer learning by generalizing a complex model to a lighter model. However, some parts of the knowledge may not be distilled by knowledge distillation sufficiently. In this paper, a novel knowledge distillation approach using transfer learning is proposed. The proposed method transfers the entire knowledge of a model to a new smaller one. To accomplish this, unlabeled data are used in an unsupervised manner to transfer the maximum amount of knowledge to the new slimmer model. The proposed method can be beneficial in medical image analysis, where labeled data are typically scarce. The proposed approach is evaluated in the context of classification of images for diagnosing Diabetic Retinopathy on two publicly available datasets, including Messidor and EyePACS. Simulation results demonstrate that the approach is effective in transferring knowledge from a complex model to a lighter one. Furthermore, experimental results illustrate that the performance of different small models is improved significantly using unlabeled data and knowledge distillation.
SPJul 24, 2020
Selection of Proper EEG Channels for Subject Intention Classification Using Deep LearningGhazale Ghorbanzade, Zahra Nabizadeh-ShahreBabak, Shadrokh Samavi et al.
Brain signals could be used to control devices to assist individuals with disabilities. Signals such as electroencephalograms are complicated and hard to interpret. A set of signals are collected and should be classified to identify the intention of the subject. Different approaches have tried to reduce the number of channels before sending them to a classifier. We are proposing a deep learning-based method for selecting an informative subset of channels that produce high classification accuracy. The proposed network could be trained for an individual subject for the selection of an appropriate set of channels. Reduction of the number of channels could reduce the complexity of brain-computer-interface devices. Our method could find a subset of channels. The accuracy of our approach is comparable with a model trained on all channels. Hence, our model's temporal and power costs are low, while its accuracy is kept high.
MMMay 11, 2020
Hardware Implementation of Adaptive Watermarking Based on Local Spatial Disorder AnalysisMohsen Hajabdolahi, Nader Karimi, Shahram Shirani et al.
With the increasing use of the internet and the ease of exchange of multimedia content, the protection of ownership rights has become a significant concern. Watermarking is an efficient means for this purpose. In many applications, real-time watermarking is required, which demands hardware implementation of low complexity and robust algorithm. In this paper, an adaptive watermarking is presented, which uses embedding in different bit-planes to achieve transparency and robustness. Local disorder of pixels is analyzed to control the strength of the watermark. A new low complexity method for disorder analysis is proposed, and its hardware implantation is presented. An embedding method is proposed, which causes lower degradation in the watermarked image. Also, the performance of proposed watermarking architecture is improved by a pipe-line structure and is tested on an FPGA device. Results show that the algorithm produces transparent and robust watermarked images. The synthesis report from FPGA implementation illustrates a low complexity hardware structure.
MMApr 6, 2020
Robust Wavelet-Based Watermarking Using Dynamic Strength FactorMahsa Kadkhodaei, Shadrokh Samavi
In unsecured network environments, ownership protection of digital contents, such as images, is becoming a growing concern. Different watermarking methods have been proposed to address the copyright protection of digital materials. Watermarking methods are challenged with conflicting parameters of imperceptibility and robustness. While embedding a watermark with a high strength factor increases robustness, it also decreases imperceptibility of the watermark. Thus embedding in visually less sensitive regions, i.e., complex image blocks could satisfy both requirements. This paper presents a new wavelet-based watermarking technique using an adaptive strength factor to tradeoff between watermark transparency and robustness. We measure variations of each image block to adaptively set a strength-factor for embedding the watermark in that block. On the other hand, the decoder uses the selected coefficients to safely extract the watermark through a voting algorithm. The proposed method shows better results in terms of PSNR and BER in comparison to recent methods for attacks, such as Median Filter, Gaussian Filter, and JPEG compression.
CVMar 27, 2020
Acceleration of Convolutional Neural Network Using FFT-Based Split ConvolutionsKamran Chitsaz, Mohsen Hajabdollahi, Nader Karimi et al.
Convolutional neural networks (CNNs) have a large number of variables and hence suffer from a complexity problem for their implementation. Different methods and techniques have developed to alleviate the problem of CNN's complexity, such as quantization, pruning, etc. Among the different simplification methods, computation in the Fourier domain is regarded as a new paradigm for the acceleration of CNNs. Recent studies on Fast Fourier Transform (FFT) based CNN aiming at simplifying the computations required for FFT. However, there is a lot of space for working on the reduction of the computational complexity of FFT. In this paper, a new method for CNN processing in the FFT domain is proposed, which is based on input splitting. There are problems in the computation of FFT using small kernels in situations such as CNN. Splitting can be considered as an effective solution for such issues aroused by small kernels. Using splitting redundancy, such as overlap-and-add, is reduced and, efficiency is increased. Hardware implementation of the proposed FFT method, as well as different analyses of the complexity, are performed to demonstrate the proper performance of the proposed method.
IVFeb 26, 2020
Region of Interest Identification for Brain Tumors in Magnetic Resonance ImagesFateme Mostafaie, Reihaneh Teimouri, Zahra Nabizadeh et al.
Glioma is a common type of brain tumor, and accurate detection of it plays a vital role in the diagnosis and treatment process. Despite advances in medical image analyzing, accurate tumor segmentation in brain magnetic resonance (MR) images remains a challenge due to variations in tumor texture, position, and shape. In this paper, we propose a fast, automated method, with light computational complexity, to find the smallest bounding box around the tumor region. This region-of-interest can be used as a preprocessing step in training networks for subregion tumor segmentation. By adopting the outputs of this algorithm, redundant information is removed; hence the network can focus on learning notable features related to subregions' classes. The proposed method has six main stages, in which the brain segmentation is the most vital step. Expectation-maximization (EM) and K-means algorithms are used for brain segmentation. The proposed method is evaluated on the BraTS 2015 dataset, and the average gained DICE score is 0.73, which is an acceptable result for this application.
CVFeb 9, 2020
Unlabeled Data Deployment for Classification of Diabetic Retinopathy Images Using Knowledge TransferSajjad Abbasi, Mohsen Hajabdollahi, Nader Karimi et al.
Convolutional neural networks (CNNs) are extensively beneficial for medical image processing. Medical images are plentiful, but there is a lack of annotated data. Transfer learning is used to solve the problem of lack of labeled data and grants CNNs better training capability. Transfer learning can be used in many different medical applications; however, the model under transfer should have the same size as the original network. Knowledge distillation is recently proposed to transfer the knowledge of a model to another one and can be useful to cover the shortcomings of transfer learning. But some parts of the knowledge may not be distilled by knowledge distillation. In this paper, a novel knowledge distillation using transfer learning is proposed to transfer the whole knowledge of a model to another one. The proposed method can be beneficial and practical for medical image analysis in which a small number of labeled data are available. The proposed process is tested for diabetic retinopathy classification. Simulation results demonstrate that using the proposed method, knowledge of an extensive network can be transferred to a smaller model.
CVFeb 9, 2020
Splitting Convolutional Neural Network Structures for Efficient InferenceEmad MalekHosseini, Mohsen Hajabdollahi, Nader Karimi et al.
For convolutional neural networks (CNNs) that have a large volume of input data, memory management becomes a major concern. Memory cost reduction can be an effective way to deal with these problems that can be realized through different techniques such as feature map pruning, input data splitting, etc. Among various methods existing in this area of research, splitting the network structure is an interesting research field, and there are a few works done in this area. In this study, the problem of reducing memory utilization using network structure splitting is addressed. A new technique is proposed to split the network structure into small parts that consume lower memory than the original network. The split parts can be processed almost separately, which provides an essential role for better memory management. The split approach has been tested on two well-known network structures of VGG16 and ResNet18 for the classification of CIFAR10 images. Simulation results show that the splitting method reduces both the number of computational operations as well as the amount of memory consumption.
CVFeb 9, 2020
Convolutional Neural Network Pruning Using Filter AttenuationMorteza Mousa-Pasandi, Mohsen Hajabdollahi, Nader Karimi et al.
Filters are the essential elements in convolutional neural networks (CNNs). Filters are corresponded to the feature maps and form the main part of the computational and memory requirement for the CNN processing. In filter pruning methods, a filter with all of its components, including channels and connections, are removed. The removal of a filter can cause a drastic change in the network's performance. Also, the removed filters cannot come back to the network structure. We want to address these problems in this paper. We propose a CNN pruning method based on filter attenuation in which weak filters are not directly removed. Instead, weak filters are attenuated and gradually removed. In the proposed attenuation approach, weak filters are not abruptly removed, and there is a chance for these filters to return to the network. The filter attenuation method is assessed using the VGG model for the Cifar10 image classification task. Simulation results show that the filter attenuation works with different pruning criteria, and better results are obtained in comparison with the conventional pruning methods.
IVFeb 5, 2020
Brain Tumor Segmentation by Cascaded Deep Neural Networks Using Multiple Image ScalesZahra Sobhaninia, Safiyeh Rezaei, Nader Karimi et al.
Intracranial tumors are groups of cells that usually grow uncontrollably. One out of four cancer deaths is due to brain tumors. Early detection and evaluation of brain tumors is an essential preventive medical step that is performed by magnetic resonance imaging (MRI). Many segmentation techniques exist for this purpose. Low segmentation accuracy is the main drawback of existing methods. In this paper, we use a deep learning method to boost the accuracy of tumor segmentation in MR images. Cascade approach is used with multiple scales of images to induce both local and global views and help the network to reach higher accuracies. Our experimental results show that using multiple scales and the utilization of two cascade networks is advantageous.
CVJan 13, 2020
Modeling of Pruning Techniques for Deep Neural Networks SimplificationMorteza Mousa Pasandi, Mohsen Hajabdollahi, Nader Karimi et al.
Convolutional Neural Networks (CNNs) suffer from different issues, such as computational complexity and the number of parameters. In recent years pruning techniques are employed to reduce the number of operations and model size in CNNs. Different pruning methods are proposed, which are based on pruning the connections, channels, and filters. Various techniques and tricks accompany pruning methods, and there is not a unifying framework to model all the pruning methods. In this paper pruning methods are investigated, and a general model which is contained the majority of pruning techniques is proposed. The advantages and disadvantages of the pruning methods can be identified, and all of them can be summarized under this model. The final goal of this model is to provide a general approach for all of the pruning methods with different structures and applications.
CVJan 10, 2020
Image Inpainting by Multiscale Spline InterpolationGhazale Ghorbanzade, Zahra Nabizadeh, Nader Karimi et al.
Recovering the missing regions of an image is a task that is called image inpainting. Depending on the shape of missing areas, different methods are presented in the literature. One of the challenges of this problem is extracting features that lead to better results. Experimental results show that both global and local features are useful for this purpose. In this paper, we propose a multi-scale image inpainting method that utilizes both local and global features. The first step of this method is to determine how many scales we need to use, which depends on the width of the lines in the map of the missing region. Then we apply adaptive image inpainting to the damaged areas of the image, and the lost pixels are predicted. Each scale is inpainted and the result is resized to the original size. Then a voting process produces the final result. The proposed method is tested on damaged images with scratches and creases. The metric that we use to evaluate our approach is PSNR. On average, we achieved 1.2 dB improvement over some existing inpainting approaches.
MMJan 9, 2020
Adaptive Control of Embedding Strength in Image Watermarking using Neural NetworksMahnoosh Bagheri, Majid Mohrekesh, Nader Karimi et al.
Digital image watermarking has been widely used in different applications such as copyright protection of digital media, such as audio, image, and video files. Two opposing criteria of robustness and transparency are the goals of watermarking methods. In this paper, we propose a framework for determining the appropriate embedding strength factor. The framework can use most DWT and DCT based blind watermarking approaches. We use Mask R-CNN on the COCO dataset to find a good strength factor for each sub-block. Experiments show that this method is robust against different attacks and has good transparency.
CVDec 31, 2019
Image Seam-Carving by Controlling Positional Distribution of SeamsMahdi Ahmadi, Nader Karimi, Shadrokh Samavi
Image retargeting is a new image processing task that renders the change of aspect ratio in images. One of the most famous image-retargeting algorithms is seam-carving. Although seam-carving is fast and straightforward, it usually distorts the images. In this paper, we introduce a new seam-carving algorithm that not only has the simplicity of the original seam-carving but also lacks the usual unwanted distortion existed in the original method. The positional distribution of seams is introduced. We show that the proposed method outperforms the original seam-carving in terms of retargeted image quality assessment and seam coagulation measures.
CVDec 31, 2019
Modeling Neural Architecture Search Methods for Deep NetworksEmad Malekhosseini, Mohsen Hajabdollahi, Nader Karimi et al.
There are many research works on the designing of architectures for the deep neural networks (DNN), which are named neural architecture search (NAS) methods. Although there are many automatic and manual techniques for NAS problems, there is no unifying model in which these NAS methods can be explored and compared. In this paper, we propose a general abstraction model for NAS methods. By using the proposed framework, it is possible to compare different design approaches for categorizing and identifying critical areas of interest in designing DNN architectures. Also, under this framework, different methods in the NAS area are summarized; hence a better view of their advantages and disadvantages is possible.
CVDec 31, 2019
Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge DistillationSajjad Abbasi, Mohsen Hajabdollahi, Nader Karimi et al.
Knowledge distillation (KD) is a new method for transferring knowledge of a structure under training to another one. The typical application of KD is in the form of learning a small model (named as a student) by soft labels produced by a complex model (named as a teacher). Due to the novel idea introduced in KD, recently, its notion is used in different methods such as compression and processes that are going to enhance the model accuracy. Although different techniques are proposed in the area of KD, there is a lack of a model to generalize KD techniques. In this paper, various studies in the scope of KD are investigated and analyzed to build a general model for KD. All the methods and techniques in KD can be summarized through the proposed model. By utilizing the proposed model, different methods in KD are better investigated and explored. The advantages and disadvantages of different approaches in KD can be better understood and develop a new strategy for KD can be possible. Using the proposed model, different KD methods are represented in an abstract view.
CVDec 27, 2019
A General Framework for Saliency Detection MethodsFateme Mostafaie, Zahra Nabizadeh, Nader Karimi et al.
Saliency detection is one of the most challenging problems in image analysis and computer vision. Many approaches propose different architectures based on the psychological and biological properties of the human visual attention system. However, there is still no abstract framework that summarizes the existing methods. In this paper, we offered a general framework for saliency models, which consists of five main steps: pre-processing, feature extraction, saliency map generation, saliency map combination, and post-processing. Also, we study different saliency models containing each level and compare their performance. This framework helps researchers to have a comprehensive view of studying new methods.
CVDec 27, 2019
An Abstraction Model for Semantic Segmentation AlgorithmsReihaneh Teymoori, Zahra Nabizadeh, Nader Karimi et al.
Semantic segmentation classifies each pixel in the image. Due to its advantages, semantic segmentation is used in many tasks, such as cancer detection, robot-assisted surgery, satellite image analysis, and self-driving cars. Accuracy and efficiency are the two crucial goals for this purpose, and several state-of-the-art neural networks exist. By employing different techniques, new solutions have been presented in each method to increase efficiency and accuracy and reduce costs. However, the diversity of the implemented approaches for semantic segmentation makes it difficult for researchers to achieve a comprehensive view of the field. In this paper, an abstraction model for semantic segmentation offers a comprehensive view of the field. The proposed framework consists of four general blocks that cover the operation of the majority of semantic segmentation methods. We also compare different approaches and analyze each of the four abstraction blocks' importance in each method's operation.
CVDec 20, 2019
Saliency Based Fire Detection Using Texture and Color FeaturesMaedeh Jamali, Nader Karimi, Shadrokh Samavi
Due to industry deployment and extension of urban areas, early warning systems have an essential role in giving emergency. Fire is an event that can rapidly spread and cause injury, death, and damage. Early detection of fire could significantly reduce these injuries. Video-based fire detection is a low cost and fast method in comparison with conventional fire detectors. Most available fire detection methods have a high false-positive rate and low accuracy. In this paper, we increase accuracy by using spatial and temporal features. Captured video sequences are divided into Spatio-temporal blocks. Then a saliency map and combination of color and texture features are used for detecting fire regions. We use the HSV color model as a spatial feature and LBP-TOP for temporal processing of fire texture. Fire detection tests on publicly available datasets have shown the accuracy and robustness of the algorithm.
IVNov 3, 2019
Gland Segmentation in Histopathological Images by Deep Neural NetworkSafiye Rezaei, Ali Emami, Nader Karimi et al.
Histology method is vital in the diagnosis and prognosis of cancers and many other diseases. For the analysis of histopathological images, we need to detect and segment all gland structures. These images are very challenging, and the task of segmentation is even challenging for specialists. Segmentation of glands determines the grade of cancer such as colon, breast, and prostate. Given that deep neural networks have achieved high performance in medical images, we propose a method based on the LinkNet network for gland segmentation. We found the effects of using different loss functions. By using Warwick-Qu dataset, which contains two test sets and one train set, we show that our approach is comparable to state-of-the-art methods. Finally, it is shown that enhancing the gland edges and the use of hematoxylin components can improve the performance of the proposed model.