Hamid Bahador

CVJul 19, 2022

LR-Net: A Block-based Convolutional Neural Network for Low-Resolution Image Classification

Ashkan Ganj, Mohsen Ebadpour, Mahdi Darvish et al.

The success of CNN-based architecture on image classification in learning and extracting features made them so popular these days, but the task of image classification becomes more challenging when we apply state of art models to classify noisy and low-quality images. It is still difficult for models to extract meaningful features from this type of image due to its low-resolution and the lack of meaningful global features. Moreover, high-resolution images need more layers to train which means they take more time and computational power to train. Our method also addresses the problem of vanishing gradients as the layers become deeper in deep neural networks that we mentioned earlier. In order to address all these issues, we developed a novel image classification architecture, composed of blocks that are designed to learn both low level and global features from blurred and noisy low-resolution images. Our design of the blocks was heavily influenced by Residual Connections and Inception modules in order to increase performance and reduce parameter sizes. We also assess our work using the MNIST family datasets, with a particular emphasis on the Oracle-MNIST dataset, which is the most difficult to classify due to its low-quality and noisy images. We have performed in-depth tests that demonstrate the presented architecture is faster and more accurate than existing cutting-edge convolutional neural networks. Furthermore, due to the unique properties of our model, it can produce a better result with fewer parameters.

CVAug 28, 2021

Towards Fine-grained Image Classification with Generative Adversarial Networks and Facial Landmark Detection

Mahdi Darvish, Mahsa Pouramini, Hamid Bahador

Fine-grained classification remains a challenging task because distinguishing categories needs learning complex and local differences. Diversity in the pose, scale, and position of objects in an image makes the problem even more difficult. Although the recent Vision Transformer models achieve high performance, they need an extensive volume of input data. To encounter this problem, we made the best use of GAN-based data augmentation to generate extra dataset instances. Oxford-IIIT Pets was our dataset of choice for this experiment. It consists of 37 breeds of cats and dogs with variations in scale, poses, and lighting, which intensifies the difficulty of the classification task. Furthermore, we enhanced the performance of the recent Generative Adversarial Network (GAN), StyleGAN2-ADA model to generate more realistic images while preventing overfitting to the training set. We did this by training a customized version of MobileNetV2 to predict animal facial landmarks; then, we cropped images accordingly. Lastly, we combined the synthetic images with the original dataset and compared our proposed method with standard GANs augmentation and no augmentation with different subsets of training data. We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model.

Hamid Bahador

2 Papers