IVMar 12, 2023
Endoscopy Classification Model Using Swin Transformer and Saliency MapZahra Sobhaninia, Nasrin Abharian, Nader Karimi et al.
Endoscopy is a valuable tool for the early diagnosis of colon cancer. However, it requires the expertise of endoscopists and is a time-consuming process. In this work, we propose a new multi-label classification method, which considers two aspects of learning approaches (local and global views) for endoscopic image classification. The model consists of a Swin transformer branch and a modified VGG16 model as a CNN branch. To help the learning process of the CNN branch, the model employs saliency maps and endoscopy images and concatenates them. The results demonstrate that this method performed well for endoscopic medical images by utilizing local and global features of the images. Furthermore, quantitative evaluations prove the proposed method's superiority over state-of-the-art works.
CVJun 12, 2023
Supervised Deep Learning for Content-Aware Image Retargeting with Fourier ConvolutionsMohammadHossein Givkashi, MohammadReza Naderi, Nader Karimi et al.
Image retargeting aims to alter the size of the image with attention to the contents. One of the main obstacles to training deep learning models for image retargeting is the need for a vast labeled dataset. Labeled datasets are unavailable for training deep learning models in the image retargeting tasks. As a result, we present a new supervised approach for training deep learning models. We use the original images as ground truth and create inputs for the model by resizing and cropping the original images. A second challenge is generating different image sizes in inference time. However, regular convolutional neural networks cannot generate images of different sizes than the input image. To address this issue, we introduced a new method for supervised learning. In our approach, a mask is generated to show the desired size and location of the object. Then the mask and the input image are fed to the network. Comparing image retargeting methods and our proposed method demonstrates the model's ability to produce high-quality retargeted images. Afterward, we compute the image quality assessment score for each output image based on different techniques and illustrate the effectiveness of our approach.
CVJan 9, 2023
SFI-Swin: Symmetric Face Inpainting with Swin Transformer by Distinctly Learning Face Components DistributionsMohammadReza Naderi, MohammadHossein Givkashi, Nader Karimi et al.
Image inpainting consists of filling holes or missing parts of an image. Inpainting face images with symmetric characteristics is more challenging than inpainting a natural scene. None of the powerful existing models can fill out the missing parts of an image while considering the symmetry and homogeneity of the picture. Moreover, the metrics that assess a repaired face image quality cannot measure the preservation of symmetry between the rebuilt and existing parts of a face. In this paper, we intend to solve the symmetry problem in the face inpainting task by using multiple discriminators that check each face organ's reality separately and a transformer-based network. We also propose "symmetry concentration score" as a new metric for measuring the symmetry of a repaired face image. The quantitative and qualitative results show the superiority of our proposed method compared to some of the recently proposed algorithms in terms of the reality, symmetry, and homogeneity of the inpainted parts.
CVSep 11, 2022
OAIR: Object-Aware Image Retargeting Using PSO and Aesthetic Quality AssessmentMohammad Reza Naderi, Mohammad Hossein Givkashi, Nader Karimi et al.
Image retargeting aims at altering an image size while preserving important content and minimizing noticeable distortions. However, previous image retargeting methods create outputs that suffer from artifacts and distortions. Besides, most previous works attempt to retarget the background and foreground of the input image simultaneously. Simultaneous resizing of the foreground and background causes changes in the aspect ratios of the objects. The change in the aspect ratio is specifically not desirable for human objects. We propose a retargeting method that overcomes these problems. The proposed approach consists of the following steps. Firstly, an inpainting method uses the input image and the binary mask of foreground objects to produce a background image without any foreground objects. Secondly, the seam carving method resizes the background image to the target size. Then, a super-resolution method increases the input image quality, and we then extract the foreground objects. Finally, the retargeted background and the extracted super-resolued objects are fed into a particle swarm optimization algorithm (PSO). The PSO algorithm uses aesthetic quality assessment as its objective function to identify the best location and size for the objects to be placed in the background. We used image quality assessment and aesthetic quality assessment measures to show our superior results compared to popular image retargeting techniques.
CVNov 15, 2022
Dynamic-Pix2Pix: Noise Injected cGAN for Modeling Input and Target Domain Joint Distributions with Limited Training DataMohammadreza Naderi, Nader Karimi, Ali Emami et al.
Learning to translate images from a source to a target domain with applications such as converting simple line drawing to oil painting has attracted significant attention. The quality of translated images is directly related to two crucial issues. First, the consistency of the output distribution with that of the target is essential. Second, the generated output should have a high correlation with the input. Conditional Generative Adversarial Networks, cGANs, are the most common models for translating images. The performance of a cGAN drops when we use a limited training dataset. In this work, we increase the Pix2Pix (a form of cGAN) target distribution modeling ability with the help of dynamic neural network theory. Our model has two learning cycles. The model learns the correlation between input and ground truth in the first cycle. Then, the model's architecture is refined in the second cycle to learn the target distribution from noise input. These processes are executed in each iteration of the training procedure. Helping the cGAN learn the target distribution from noise input results in a better model generalization during the test time and allows the model to fit almost perfectly to the target domain distribution. As a result, our model surpasses the Pix2Pix model in segmenting HC18 and Montgomery's chest x-ray images. Both qualitative and Dice scores show the superiority of our model. Although our proposed method does not use thousand of additional data for pretraining, it produces comparable results for the in and out-domain generalization compared to the state-of-the-art methods.
LGNov 26, 2025
RAVQ-HoloNet: Rate-Adaptive Vector-Quantized Hologram CompressionShima Rafiei, Zahra Nabizadeh Shahr Babak, Shadrokh Samavi et al.
Holography offers significant potential for AR/VR applications, yet its adoption is limited by the high demands of data compression. Existing deep learning approaches generally lack rate adaptivity within a single network. We present RAVQ-HoloNet, a rate-adaptive vector quantization framework that achieves high-fidelity reconstructions at low and ultra-low bit rates, outperforming current state-of-the-art methods. In low bit, our method exceeds by -33.91% in BD-Rate and achieves a BD-PSNR of 1.02 dB from the best existing method demonstrated by the rate-distortion curve.
CVApr 9, 2022
Adaptive search area for fast motion estimationS. M. Reza Soroushmehr, Shadrokh Samavi, Shahram Shirani
This paper suggests a new method for determining the search area for a motion estimation algorithm based on block matching. The search area is adaptively found in the proposed method for each frame block. This search area is similar to that of the full search (FS) algorithm but smaller for most blocks of a frame. Therefore, the proposed algorithm is analogous to FS in terms of regularity but has much less computational complexity. The temporal and spatial correlations among the motion vectors of blocks are used to find the search area. The matched block is chosen from a rectangular area that the prediction vectors set out. Simulation results indicate that the speed of the proposed algorithm is at least seven times better than the FS algorithm.
CVApr 4Code
HistoFusionNet: Histogram-Guided Fusion and Frequency-Adaptive Refinement for Nighttime Image DehazingMohammad Heydari, Wei Dong, Shahram Shirani et al.
Nighttime image dehazing remains a challenging low-level vision problem due to the joint presence of haze, glow, non-uniform illumination, color distortion, and sensor noise, which often invalidate assumptions commonly used in daytime dehazing. To address these challenges, we propose HistoFusionNet, a transformer-enhanced architecture tailored for nighttime image dehazing by combining histogram-guided representation learning with frequency-adaptive feature refinement. Built upon a multi-scale encoder-decoder backbone, our method introduces histogram transformer blocks that model long-range dependencies by grouping features according to their dynamic-range characteristics, enabling more effective aggregation of similarly degraded regions under complex nighttime lighting. To further improve restoration fidelity, we incorporate a frequency-aware refinement branch that adaptively exploits complementary low- and high-frequency cues, helping recover scene structures, suppress artifacts, and enhance local details. This design yields a unified framework that is particularly well suited to the heterogeneous degradations encountered in real nighttime hazy scenes. Extensive experiments and highly competitive performance of our method on the NTIRE 2026 Nighttime Image Dehazing Challenge benchmark demonstrate the effectiveness of the proposed method. Our team ranked 1st among 22 participating teams, highlighting the robustness and competitive performance of HistoFusionNet. The code is available at: https://github.com/heydarimo/Night-Time-Dehazing
ASOct 4, 2024
Manikin-Recorded Cardiopulmonary Sounds Dataset Using Digital StethoscopeYasaman Torabi, Shahram Shirani, James P. Reilly
Heart and lung sounds are crucial for healthcare monitoring. Recent improvements in stethoscope technology have made it possible to capture patient sounds with enhanced precision. In this dataset, we used a digital stethoscope to capture both heart and lung sounds, including individual and mixed recordings. To our knowledge, this is the first dataset to offer both separate and mixed cardiorespiratory sounds. The recordings were collected from a clinical manikin, a patient simulator designed to replicate human physiological conditions, generating clean heart and lung sounds at different body locations. This dataset includes both normal sounds and various abnormalities (i.e., murmur, atrial fibrillation, tachycardia, atrioventricular block, third and fourth heart sound, wheezing, crackles, rhonchi, pleural rub, and gurgling sounds). The dataset includes audio recordings of chest examinations performed at different anatomical locations, as determined by specialist nurses. Each recording has been enhanced using frequency filters to highlight specific sound types. This dataset is useful for applications in artificial intelligence, such as automated cardiopulmonary disease detection, sound classification, unsupervised separation techniques, and deep learning algorithms related to audio signal processing.
LGNov 4, 2025
QuPCG: Quantum Convolutional Neural Network for Detecting Abnormal Patterns in PCG SignalsYasaman Torabi, Shahram Shirani, James P. Reilly
Early identification of abnormal physiological patterns is essential for the timely detection of cardiac disease. This work introduces a hybrid quantum-classical convolutional neural network (QCNN) designed to classify S3 and murmur abnormalities in heart sound signals. The approach transforms one-dimensional phonocardiogram (PCG) signals into compact two-dimensional images through a combination of wavelet feature extraction and adaptive threshold compression methods. We compress the cardiac-sound patterns into an 8-pixel image so that only 8 qubits are needed for the quantum stage. Preliminary results on the HLS-CMDS dataset demonstrate 93.33% classification accuracy on the test set and 97.14% on the train set, suggesting that quantum models can efficiently capture temporal-spectral correlations in biomedical signals. To our knowledge, this is the first application of a QCNN algorithm for bioacoustic signal processing. The proposed method represents an early step toward quantum-enhanced diagnostic systems for resource-constrained healthcare environments.
MMDec 8, 2023
High-Quality Live Video Streaming via Transcoding Time Prediction and Preset SelectionZahra Nabizadeh Shahre-Babak, Nader Karimi, Krishna Rapaka et al.
Video streaming often requires transcoding content into different resolutions and bitrates to match the recipient's internet speed and screen capabilities. Video encoders like x264 offer various presets, each with different tradeoffs between transcoding time and rate-distortion performance. Choosing the best preset for video transcoding is difficult, especially for live streaming, as trying all the presets and choosing the best one is not feasible. One solution is to predict each preset's transcoding time and select the preset that ensures the highest quality while adhering to live streaming time constraints. Prediction of video transcoding time is also critical in minimizing streaming delays, deploying resource management algorithms, and load balancing. We propose a learning-based framework for predicting the transcoding time of videos across various presets. Our predictor's features for video transcoding time prediction are derived directly from the ingested stream, primarily from the header or metadata. As a result, only minimal additional delay is incurred for feature extraction, rendering our approach ideal for live-streaming applications. We evaluated our learning-based transcoding time prediction using a dataset of videos. The results demonstrate that our framework can accurately predict the transcoding time for different presets, with a mean absolute percentage error (MAPE) of nearly 5.0%. Leveraging these predictions, we then select the most suitable transcoding preset for live video streaming. Utilizing our transcoding time prediction-based preset selection improved Peak Signal-to-Noise Ratio (PSNR) of up to 5 dB.
LGOct 8, 2025
Chem-NMF: Multi-layer $α$-divergence Non-Negative Matrix Factorization for Cardiorespiratory Disease Clustering, with Improved Convergence Inspired by Chemical Catalysts and Rigorous Asymptotic AnalysisYasaman Torabi, Shahram Shirani, James P. Reilly
Non-Negative Matrix Factorization (NMF) is an unsupervised learning method offering low-rank representations across various domains such as audio processing, biomedical signal analysis, and image recognition. The incorporation of $α$-divergence in NMF formulations enhances flexibility in optimization, yet extending these methods to multi-layer architectures presents challenges in ensuring convergence. To address this, we introduce a novel approach inspired by the Boltzmann probability of the energy barriers in chemical reactions to theoretically perform convergence analysis. We introduce a novel method, called Chem-NMF, with a bounding factor which stabilizes convergence. To our knowledge, this is the first study to apply a physical chemistry perspective to rigorously analyze the convergence behaviour of the NMF algorithm. We start from mathematically proven asymptotic convergence results and then show how they apply to real data. Experimental results demonstrate that the proposed algorithm improves clustering accuracy by 5.6% $\pm$ 2.7% on biomedical signals and 11.1% $\pm$ 7.2% on face images (mean $\pm$ std).
IVNov 22, 2024
BrightVAE: Luminosity Enhancement in Underexposed Endoscopic ImagesFarzaneh Koohestani, Zahra Nabizadeh, Nader Karimi et al.
The enhancement of image luminosity is especially critical in endoscopic images. Underexposed endoscopic images often suffer from reduced contrast and uneven brightness, significantly impacting diagnostic accuracy and treatment planning. Internal body imaging is challenging due to uneven lighting and shadowy regions. Enhancing such images is essential since precise image interpretation is crucial for patient outcomes. In this paper, we introduce BrightVAE, an architecture based on the hierarchical Vector Quantized Variational Autoencoder (hierarchical VQ-VAE) tailored explicitly for enhancing luminosity in low-light endoscopic images. Our architecture is meticulously designed to tackle the unique challenges inherent in endoscopic imaging, such as significant variations in illumination and obscured details due to poor lighting conditions. The proposed model emphasizes advanced feature extraction from three distinct viewpoints-incorporating various receptive fields, skip connections, and feature attentions to robustly enhance image quality and support more accurate medical diagnoses. Through rigorous experimental analysis, we demonstrate the effectiveness of these techniques in enhancing low-light endoscopic images. To evaluate the performance of our architecture, we employ three widely recognized metrics-SSIM, PSNR, and LPIPS-specifically on Endo4IE dataset, which consists of endoscopic images. We evaluated our method using the Endo4IE dataset, which consists exclusively of endoscopic images, and showed significant advancements over the state-of-the-art methods for enhancing luminosity in endoscopic imaging.
SPJun 18, 2024
MEMS and ECM Sensor Technologies for Cardiorespiratory Sound Monitoring - A Comprehensive ReviewYasaman Torabi, Shahram Shirani, James P. Reilly et al.
This paper presents a comprehensive review of cardiorespiratory auscultation sensing devices (i.e., stethoscopes), which is useful for understanding the theoretical aspects and practical design notes. In this paper, we first introduce the acoustic properties of the heart and lungs, as well as a brief history of stethoscope evolution. Then, we discuss the basic concept of electret condenser microphones (ECMs) and a stethoscope based on them. Then, we discuss the microelectromechanical systems (MEMSs) technology, particularly focusing on piezoelectric transducer sensors. This paper comprehensively reviews sensing technologies for cardiorespiratory auscultation, emphasizing MEMS-based wearable designs in the past decade. To our knowledge, this is the first paper to summarize ECM and MEMS applications for heart and lung sound analysis.
SDJun 3, 2024
Sequence-to-Sequence Multi-Modal Speech In-PaintingMahsa Kadkhodaei Elyaderani, Shahram Shirani
Speech in-painting is the task of regenerating missing audio contents using reliable context information. Despite various recent studies in multi-modal perception of audio in-painting, there is still a need for an effective infusion of visual and auditory information in speech in-painting. In this paper, we introduce a novel sequence-to-sequence model that leverages the visual information to in-paint audio signals via an encoder-decoder architecture. The encoder plays the role of a lip-reader for facial recordings and the decoder takes both encoder outputs as well as the distorted audio spectrograms to restore the original speech. Our model outperforms an audio-only speech in-painting model and has comparable results with a recent multi-modal speech in-painter in terms of speech quality and intelligibility metrics for distortions of 300 ms to 1500 ms duration, which proves the effectiveness of the introduced multi-modality in speech in-painting.
MMJun 2, 2024
Robust Multi-Modal Speech In-Painting: A Sequence-to-Sequence ApproachMahsa Kadkhodaei Elyaderani, Shahram Shirani
The process of reconstructing missing parts of speech audio from context is called speech in-painting. Human perception of speech is inherently multi-modal, involving both audio and visual (AV) cues. In this paper, we introduce and study a sequence-to-sequence (seq2seq) speech in-painting model that incorporates AV features. Our approach extends AV speech in-painting techniques to scenarios where both audio and visual data may be jointly corrupted. To achieve this, we employ a multi-modal training paradigm that boosts the robustness of our model across various conditions involving acoustic and visual distortions. This makes our distortion-aware model a plausible solution for real-world challenging environments. We compare our method with existing transformer-based and recurrent neural network-based models, which attempt to reconstruct missing speech gaps ranging from a few milliseconds to over a second. Our experimental results demonstrate that our novel seq2seq architecture outperforms the state-of-the-art transformer solution by 38.8% in terms of enhancing speech quality and 7.14% in terms of improving speech intelligibility. We exploit a multi-task learning framework that simultaneously performs lip-reading (transcribing video components to text) while reconstructing missing parts of the associated speech.
MMApr 13, 2024
A Parametric Rate-Distortion Model for Video TranscodingMaedeh Jamali, Nader Karimi, Shadrokh Samavi et al.
Over the past two decades, the surge in video streaming applications has been fueled by the increasing accessibility of the internet and the growing demand for network video. As users with varying internet speeds and devices seek high-quality video, transcoding becomes essential for service providers. In this paper, we introduce a parametric rate-distortion (R-D) transcoding model. Our model excels at predicting transcoding distortion at various rates without the need for encoding the video. This model serves as a versatile tool that can be used to achieve visual quality improvement (in terms of PSNR) via trans-sizing. Moreover, we use our model to identify visually lossless and near-zero-slope bitrate ranges for an ingest video. Having this information allows us to adjust the transcoding target bitrate while introducing visually negligible quality degradations. By utilizing our model in this manner, quality improvements up to 2 dB and bitrate savings of up to 46% of the original target bitrate are possible. Experimental results demonstrate the efficacy of our model in video transcoding rate distortion prediction.
CVSep 12, 2021
MSGDD-cGAN: Multi-Scale Gradients Dual Discriminator Conditional Generative Adversarial NetworkMohammadreza Naderi, Zahra Nabizadeh, Nader Karimi et al.
Conditional Generative Adversarial Networks (cGANs) have been used in many image processing tasks. However, they still have serious problems maintaining the balance between conditioning the output on the input and creating the output with the desired distribution based on the corresponding ground truth. The traditional cGANs, similar to most conventional GANs, suffer from vanishing gradients, which backpropagate from the discriminator to the generator. Moreover, the traditional cGANs are sensitive to architectural changes due to previously mentioned gradient problems. Therefore, balancing the architecture of the cGANs is almost impossible. Recently MSG-GAN has been proposed to stabilize the performance of the GANs by applying multiple connections between the generator and discriminator. In this work, we propose a method called MSGDD-cGAN, which first stabilizes the performance of the cGANs using multi-connections gradients flow. Secondly, the proposed network architecture balances the correlation of the output to input and the fitness of the output on the target distribution. This balance is generated by using the proposed dual discrimination procedure. We tested our model by segmentation of fetal ultrasound images. Our model shows a 3.18% increase in the F1 score comparing to the pix2pix version of cGANs.
MMMay 24, 2021
Robust Watermarking using Diffusion of Logo into Autoencoder Feature MapsMaedeh Jamali, Nader Karim, Pejman Khadivi et al.
Digital contents have grown dramatically in recent years, leading to increased attention to copyright. Image watermarking has been considered one of the most popular methods for copyright protection. With the recent advancements in applying deep neural networks in image processing, these networks have also been used in image watermarking. Robustness and imperceptibility are two challenging features of watermarking methods that the trade-off between them should be satisfied. In this paper, we propose to use an end-to-end network for watermarking. We use a convolutional neural network (CNN) to control the embedding strength based on the image content. Dynamic embedding helps the network to have the lowest effect on the visual quality of the watermarked image. Different image processing attacks are simulated as a network layer to improve the robustness of the model. Our method is a blind watermarking approach that replicates the watermark string to create a matrix of the same size as the input image. Instead of diffusing the watermark data into the input image, we inject the data into the feature space and force the network to do this in regions that increase the robustness against various attacks. Experimental results show the superiority of the proposed method in terms of imperceptibility and robustness compared to the state-of-the-art algorithms.
HCOct 21, 2020
Literature Review of Computer Tools for the Visually Impaired: a focus on Search EnginesGuy Meyer, Alan Wassyng, Mark Lawford et al.
A sudden reliance on the internet has resulted in the global standardization of specific software and interfaces tailored for the average user. Whether it be web apps or dedicated software, the methods of interaction are seemingly similar. But when the computer tool is presented with unique users, specifically with a disability, the quality of interaction degrades, sometimes to a point of complete uselessness. This roots from one's focus on the average user rather than the development of a platform for all (a golden standard). This paper reviews published works and products that deal with providing accessibility to visually impaired online users. Due to the variety of tools that are available to computer users, the paper focuses on search engines as a primary tool for browsing the web. By analyzing the attributes discussed below, the reader is equipped with a set of references for existing applications, along with practical insight and recommendations for accessible design. Finally, the necessary considerations for future developments and summaries of important focal points are highlighted.
IVSep 1, 2020
Classification of Diabetic Retinopathy Using Unlabeled Data and Knowledge DistillationSajjad Abbasi, Mohsen Hajabdollahi, Pejman Khadivi et al.
Knowledge distillation allows transferring knowledge from a pre-trained model to another. However, it suffers from limitations, and constraints related to the two models need to be architecturally similar. Knowledge distillation addresses some of the shortcomings associated with transfer learning by generalizing a complex model to a lighter model. However, some parts of the knowledge may not be distilled by knowledge distillation sufficiently. In this paper, a novel knowledge distillation approach using transfer learning is proposed. The proposed method transfers the entire knowledge of a model to a new smaller one. To accomplish this, unlabeled data are used in an unsupervised manner to transfer the maximum amount of knowledge to the new slimmer model. The proposed method can be beneficial in medical image analysis, where labeled data are typically scarce. The proposed approach is evaluated in the context of classification of images for diagnosing Diabetic Retinopathy on two publicly available datasets, including Messidor and EyePACS. Simulation results demonstrate that the approach is effective in transferring knowledge from a complex model to a lighter one. Furthermore, experimental results illustrate that the performance of different small models is improved significantly using unlabeled data and knowledge distillation.
MMMay 11, 2020
Hardware Implementation of Adaptive Watermarking Based on Local Spatial Disorder AnalysisMohsen Hajabdolahi, Nader Karimi, Shahram Shirani et al.
With the increasing use of the internet and the ease of exchange of multimedia content, the protection of ownership rights has become a significant concern. Watermarking is an efficient means for this purpose. In many applications, real-time watermarking is required, which demands hardware implementation of low complexity and robust algorithm. In this paper, an adaptive watermarking is presented, which uses embedding in different bit-planes to achieve transparency and robustness. Local disorder of pixels is analyzed to control the strength of the watermark. A new low complexity method for disorder analysis is proposed, and its hardware implantation is presented. An embedding method is proposed, which causes lower degradation in the watermarked image. Also, the performance of proposed watermarking architecture is improved by a pipe-line structure and is tested on an FPGA device. Results show that the algorithm produces transparent and robust watermarked images. The synthesis report from FPGA implementation illustrates a low complexity hardware structure.
IVApr 18, 2020
A fast semi-automatic method for classification and counting the number and types of blood cells in an imageHamed Sadeghi, Shahram Shirani, David W. Capson
A novel and fast semi-automatic method for segmentation, locating and counting blood cells in an image is proposed. In this method, thresholding is used to separate the nucleus from the other parts. We also use Hough transform for circles to locate the center of white cells. Locating and counting of red cells is performed using template matching. We make use of finding local maxima, labeling and mean value computation in order to shrink the areas obtained after applying Hough transform or template matching, to a single pixel as representative of location of each region. The proposed method is very fast and computes the number and location of white cells accurately. It is also capable of locating and counting the red cells with a small error.
CVMar 27, 2020
Acceleration of Convolutional Neural Network Using FFT-Based Split ConvolutionsKamran Chitsaz, Mohsen Hajabdollahi, Nader Karimi et al.
Convolutional neural networks (CNNs) have a large number of variables and hence suffer from a complexity problem for their implementation. Different methods and techniques have developed to alleviate the problem of CNN's complexity, such as quantization, pruning, etc. Among the different simplification methods, computation in the Fourier domain is regarded as a new paradigm for the acceleration of CNNs. Recent studies on Fast Fourier Transform (FFT) based CNN aiming at simplifying the computations required for FFT. However, there is a lot of space for working on the reduction of the computational complexity of FFT. In this paper, a new method for CNN processing in the FFT domain is proposed, which is based on input splitting. There are problems in the computation of FFT using small kernels in situations such as CNN. Splitting can be considered as an effective solution for such issues aroused by small kernels. Using splitting redundancy, such as overlap-and-add, is reduced and, efficiency is increased. Hardware implementation of the proposed FFT method, as well as different analyses of the complexity, are performed to demonstrate the proper performance of the proposed method.
CVFeb 9, 2020
Unlabeled Data Deployment for Classification of Diabetic Retinopathy Images Using Knowledge TransferSajjad Abbasi, Mohsen Hajabdollahi, Nader Karimi et al.
Convolutional neural networks (CNNs) are extensively beneficial for medical image processing. Medical images are plentiful, but there is a lack of annotated data. Transfer learning is used to solve the problem of lack of labeled data and grants CNNs better training capability. Transfer learning can be used in many different medical applications; however, the model under transfer should have the same size as the original network. Knowledge distillation is recently proposed to transfer the knowledge of a model to another one and can be useful to cover the shortcomings of transfer learning. But some parts of the knowledge may not be distilled by knowledge distillation. In this paper, a novel knowledge distillation using transfer learning is proposed to transfer the whole knowledge of a model to another one. The proposed method can be beneficial and practical for medical image analysis in which a small number of labeled data are available. The proposed process is tested for diabetic retinopathy classification. Simulation results demonstrate that using the proposed method, knowledge of an extensive network can be transferred to a smaller model.
CVFeb 9, 2020
Splitting Convolutional Neural Network Structures for Efficient InferenceEmad MalekHosseini, Mohsen Hajabdollahi, Nader Karimi et al.
For convolutional neural networks (CNNs) that have a large volume of input data, memory management becomes a major concern. Memory cost reduction can be an effective way to deal with these problems that can be realized through different techniques such as feature map pruning, input data splitting, etc. Among various methods existing in this area of research, splitting the network structure is an interesting research field, and there are a few works done in this area. In this study, the problem of reducing memory utilization using network structure splitting is addressed. A new technique is proposed to split the network structure into small parts that consume lower memory than the original network. The split parts can be processed almost separately, which provides an essential role for better memory management. The split approach has been tested on two well-known network structures of VGG16 and ResNet18 for the classification of CIFAR10 images. Simulation results show that the splitting method reduces both the number of computational operations as well as the amount of memory consumption.
CVFeb 9, 2020
Convolutional Neural Network Pruning Using Filter AttenuationMorteza Mousa-Pasandi, Mohsen Hajabdollahi, Nader Karimi et al.
Filters are the essential elements in convolutional neural networks (CNNs). Filters are corresponded to the feature maps and form the main part of the computational and memory requirement for the CNN processing. In filter pruning methods, a filter with all of its components, including channels and connections, are removed. The removal of a filter can cause a drastic change in the network's performance. Also, the removed filters cannot come back to the network structure. We want to address these problems in this paper. We propose a CNN pruning method based on filter attenuation in which weak filters are not directly removed. Instead, weak filters are attenuated and gradually removed. In the proposed attenuation approach, weak filters are not abruptly removed, and there is a chance for these filters to return to the network. The filter attenuation method is assessed using the VGG model for the Cifar10 image classification task. Simulation results show that the filter attenuation works with different pruning criteria, and better results are obtained in comparison with the conventional pruning methods.
CVOct 13, 2017
Real time ridge orientation estimation for fingerprint imagesEman Alibeigi, Shadrokh Samavi, Shahram Shirani et al.
Fingerprint verification is an important bio-metric technique for personal identification. Most of the automatic verification systems are based on matching of fingerprint minutiae. Extraction of minutiae is an essential process which requires estimation of orientation of the lines in an image. Most of the existing methods involve intense mathematical computations and hence are performed through software means. In this paper a hardware scheme to perform real time orientation estimation is presented which is based on pipelined architecture. Synthesized circuits proved the functionality and accuracy of the suggested method.
MMSep 10, 2017
Hierarchical Watermarking Framework Based on Analysis of Local Complexity VariationsMajid Mohrekesh, Shekoofeh Azizi, Shahram Shirani et al.
Increasing production and exchange of multimedia content has increased the need for better protection of copyright by means of watermarking. Different methods have been proposed to satisfy the tradeoff between imperceptibility and robustness as two important characteristics in watermarking while maintaining proper data-embedding capacity. Many watermarking methods use image independent set of parameters. Different images possess different potentials for robust and transparent hosting of watermark data. To overcome this deficiency, in this paper we have proposed a new hierarchical adaptive watermarking framework. At the higher level of hierarchy, complexity of an image is ranked in comparison with complexities of images of a dataset. For a typical dataset of images, the statistical distribution of block complexities is found. At the lower level of the hierarchy, for a single cover image that is to be watermarked, complexities of blocks can be found. Local complexity variation (LCV) among a block and its neighbors is used to adaptively control the watermark strength factor of each block. Such local complexity analysis creates an adaptive embedding scheme, which results in higher transparency by reducing blockiness effects. This two level hierarchy has enabled our method to take advantage of all image blocks to elevate the embedding capacity while preserving imperceptibility. For testing the effectiveness of the proposed framework, contourlet transform (CT) in conjunction with discrete cosine transform (DCT) is used to embed pseudo-random binary sequences as watermark. Experimental results show that the proposed framework elevates the performance the watermarking routine in terms of both robustness and transparency.
CVSep 8, 2017
Vessel Segmentation and Catheter Detection in X-Ray Angiograms Using SuperpixelsHamid R. Fazlali, Nader Karimi, S. M. Reza Soroushmehr et al.
Coronary artery disease (CAD) is the leading causes of death around the world. One of the most common imaging methods for diagnosing this disease is X-ray angiography. Diagnosing using these images is usually challenging due to non-uniform illumination, low contrast, presence of other body tissues, presence of catheter etc. These challenges make the diagnoses task of cardiologists tougher and more prone to misdiagnosis. In this paper we propose a new automated framework for coronary arteries segmentation, catheter detection and center-line extraction in x-ray angiography images. Our proposed segmentation method is based on superpixels. In this method at first three different superpixel scales are exploited and a measure for vesselness probability of each superpixel is determined. A majority voting is used for obtaining an initial segmentation map from these three superpixel scales. This initial segmentation is refined by finding the orthogonal line on each ridge pixel of vessel region. In this framework we use our catheter detection and tracking method which detects the catheter by finding its ridge in the first frame and traces in other frames by fitting a second order polynomial on it. Also we use the image ridges for extracting the coronary arteries centerlines. We evaluated our method qualitatively and quantitatively on two different challenging datasets and compared it with one of the previous well-known coronary arteries segmentation methods. Our method could detect the catheter and reduced the false positive rate in addition to achieving better segmentation results. The evaluation results prove that our method performs better in a much shorter time.
CVSep 6, 2017
Adaptive Real-Time Removal of Impulse Noise in Medical ImagesZohreh HosseinKhani, Mohsen Hajabdollahi, Nader Karimi et al.
Noise is an important factor that degrades the quality of medical images. Impulse noise is a common noise, which is caused by malfunctioning of sensor elements or errors in the transmission of images. In medical images due to presence of white foreground and black background, many pixels have intensities similar to impulse noise and distinction between noisy and regular pixels is difficult. In software techniques, the accuracy of the noise removal is more important than the algorithm's complexity. But for hardware implementation having a low complexity algorithm with an acceptable accuracy is essential. In this paper a low complexity de-noising method is proposed that removes the noise by local analysis of the image blocks. The proposed method distinguishes non-noisy pixels that have noise-like intensities. All steps are designed to have low hardware complexity. Simulation results show that for different magnetic resonance images, the proposed method removes impulse noise with an acceptable accuracy.
MMSep 5, 2017
Adaptive Blind Image Watermarking Using Fuzzy Inference System Based on Human Visual PerceptionMaedeh Jamali, Shima Rafiei, S. M. Reza Soroushmehr et al.
Development of digital content has increased the necessity of copyright protection by means of watermarking. Imperceptibility and robustness are two important features of watermarking algorithms. The goal of watermarking methods is to satisfy the tradeoff between these two contradicting characteristics. Recently watermarking methods in transform domains have displayed favorable results. In this paper, we present an adaptive blind watermarking method which has high transparency in areas that are important to human visual system. We propose a fuzzy system for adaptive control of the embedding strength factor. Features such as saliency, intensity, and edge-concentration, are used as fuzzy attributes. Redundant embedding in discrete cosine transform (DCT) of wavelet domain has increased the robustness of our method. Experimental results show the efficiency of the proposed method and better results are obtained as compared to comparable methods with same size of watermark logo.
CVApr 19, 2017
OCRAPOSE II: An OCR-based indoor positioning system using mobile phone imagesHamed Sadeghi, Shahrokh Valaee, Shahram Shirani
In this paper, we propose an OCR (optical character recognition)-based localization system called OCRAPOSE II, which is applicable in a number of indoor scenarios including office buildings, parkings, airports, grocery stores, etc. In these scenarios, characters (i.e. texts or numbers) can be used as suitable distinctive landmarks for localization. The proposed system takes advantage of OCR to read these characters in the query still images and provides a rough location estimate using a floor plan. Then, it finds depth and angle-of-view of the query using the information provided by the OCR engine in order to refine the location estimate. We derive novel formulas for the query angle-of-view and depth estimation using image line segments and the OCR box information. We demonstrate the applicability and effectiveness of the proposed system through experiments in indoor scenarios. It is shown that our system demonstrates better performance compared to the state-of-the-art benchmarks in terms of location recognition rate and average localization error specially under sparse database condition.
MMJun 30, 2014
Subjective and Objective Quality Assessment of Image: A SurveyPedram Mohammadi, Abbas Ebrahimi-Moghadam, Shahram Shirani
With the increasing demand for image-based applications, the efficient and reliable evaluation of image quality has increased in importance. Measuring the image quality is of fundamental importance for numerous image processing applications, where the goal of image quality assessment (IQA) methods is to automatically evaluate the quality of images in agreement with human quality judgments. Numerous IQA methods have been proposed over the past years to fulfill this goal. In this paper, a survey of the quality assessment methods for conventional image signals, as well as the newly emerged ones, which includes the high dynamic range (HDR) and 3-D images, is presented. A comprehensive explanation of the subjective and objective IQA and their classification is provided. Six widely used subjective quality datasets, and performance measures are reviewed. Emphasis is given to the full-reference image quality assessment (FR-IQA) methods, and 9 often-used quality measures (including mean squared error (MSE), structural similarity index (SSIM), multi-scale structural similarity index (MS-SSIM), visual information fidelity (VIF), most apparent distortion (MAD), feature similarity measure (FSIM), feature similarity measure for color images (FSIMC), dynamic range independent measure (DRIM), and tone-mapped images quality index (TMQI)) are carefully described, and their performance and computation time on four subjective quality datasets are evaluated. Furthermore, a brief introduction to 3-D IQA is provided and the issues related to this area of research are reviewed.