LGFeb 12, 2023
Autoselection of the Ensemble of Convolutional Neural Networks with Second-Order Cone ProgrammingBuse Çisil Güldoğuş, Abdullah Nazhat Abdullah, Muhammad Ammar Ali et al.
Ensemble techniques are frequently encountered in machine learning and engineering problems since the method combines different models and produces an optimal predictive solution. The ensemble concept can be adapted to deep learning models to provide robustness and reliability. Due to the growth of the models in deep learning, using ensemble pruning is highly important to deal with computational complexity. Hence, this study proposes a mathematical model which prunes the ensemble of Convolutional Neural Networks (CNN) consisting of different depths and layers that maximizes accuracy and diversity simultaneously with a sparse second order conic optimization model. The proposed model is tested on CIFAR-10, CIFAR-100 and MNIST data sets which gives promising results while reducing the complexity of models, significantly.
CVMay 24, 2024
Activator: GLU Activation Function as the Core Component of a Vision TransformerAbdullah Nazhat Abdullah, Tarkan Aydin
The transformer architecture has driven many successes in a variety of tasks within the field of deep learning, in particular the recent advances in natural language processing (NLP) culminating with large language models (LLM). Adding to that success, transformer architecture has found widespread interest from computer vision (CV) researchers and practitioners, allowing for many advancements in vision-related tasks and opening the door for multitask and multi-modal deep learning architectures that share the same principle of operation. One drawback to these architectures is their reliance on the scaled dot product attention mechanism with the softmax activation function, which is computationally expensive and requires large compute capabilities for both training and inference. This paper investigates substituting the MLP and attention mechanism usually adopted for transformer architecture with an architecture based on incorporating a gated linear unit (GLU) activation function structure with the aim of reducing the computational cost. The equalized experimental assessments conducted in this work show that the proposed modification with the targeted reductions in computational complexity offers competitive performance compared to the selected baseline architectures. The results are significantly in support of the aims of this work, in which the focus was to extensively utilize GLU-based MLPs, establishing a more efficient but capable alternative to the traditional MLP and the attention mechanism as the core component in the design of transformer architectures.
CVMar 4, 2024
NiNformer: A Network in Network Transformer with Token Mixing Generated Gating FunctionAbdullah Nazhat Abdullah, Tarkan Aydin
The attention mechanism is the primary component of the transformer architecture; it has led to significant advancements in deep learning spanning many domains and covering multiple tasks. In computer vision, the attention mechanism was first incorporated in the Vision Transformer ViT, and then its usage has expanded into many tasks in the vision domain, such as classification, segmentation, object detection, and image generation. While the attention mechanism is very expressive and capable, it comes with the disadvantage of being computationally expensive and requiring datasets of considerable size for effective optimization. To address these shortcomings, many designs have been proposed in the literature to reduce the computational burden and alleviate the data size requirements. Examples of such attempts in the vision domain are the MLP-Mixer, the Conv-Mixer, the Perciver-IO, and many more attempts with different sets of advantages and disadvantages. This paper introduces a new computational block as an alternative to the standard ViT block. The newly proposed block reduces the computational requirements by replacing the normal attention layers with a Network in Network structure, therefore enhancing the static approach of the MLP-Mixer with a dynamic learning of element-wise gating function generated by a token mixing process. Extensive experimentation shows that the proposed design provides better performance than the baseline architectures on multiple datasets applied in the image classification task of the vision domain.