Arda Yurdakul

CR
6papers
240citations
Novelty42%
AI Score25

6 Papers

LGMar 26, 2023
Common Subexpression-based Compression and Multiplication of Sparse Constant Matrices

Emre Bilgili, Arda Yurdakul

In deep learning inference, model parameters are pruned and quantized to reduce the model size. Compression methods and common subexpression (CSE) elimination algorithms are applied on sparse constant matrices to deploy the models on low-cost embedded devices. However, the state-of-the-art CSE elimination methods do not scale well for handling large matrices. They reach hours for extracting CSEs in a $200 \times 200$ matrix while their matrix multiplication algorithms execute longer than the conventional matrix multiplication methods. Besides, there exist no compression methods for matrices utilizing CSEs. As a remedy to this problem, a random search-based algorithm is proposed in this paper to extract CSEs in the column pairs of a constant matrix. It produces an adder tree for a $1000 \times 1000$ matrix in a minute. To compress the adder tree, this paper presents a compression format by extending the Compressed Sparse Row (CSR) to include CSEs. While compression rates of more than $50\%$ can be achieved compared to the original CSR format, simulations for a single-core embedded system show that the matrix multiplication execution time can be reduced by $20\%$.

CVMar 21, 2022
Image Classification on Accelerated Neural Networks

Ilkay Sikdokur, Inci Baytas, Arda Yurdakul

For image classification problems, various neural network models are commonly used due to their success in yielding high accuracies. Convolutional Neural Network (CNN) is one of the most frequently used deep learning methods for image classification applications. It may produce extraordinarily accurate results with regard to its complexity. However, the more complex the model is the longer it takes to train. In this paper, an acceleration design that uses the power of FPGA is given for a basic CNN model which consists of one convolutional layer and one fully connected layer for the training phase of the fully connected layer. Nonetheless, inference phase is also accelerated automatically due to the fact that training phase includes inference. In this design, the convolutional layer is calculated by the host computer and the fully connected layer is calculated by an FPGA board. It should be noted that the training of convolutional layer is not taken into account in this design and is left for future research. The results are quite encouraging as this FPGA design tops the performance of some of the state-of-the-art deep learning platforms such as Tensorflow on the host computer approximately 2 times in both training and inference.

LGJul 25, 2023
EdgeConvEns: Convolutional Ensemble Learning for Edge Intelligence

Ilkay Sikdokur, İnci M. Baytaş, Arda Yurdakul

Deep edge intelligence aims to deploy deep learning models that demand computationally expensive training in the edge network with limited computational power. Moreover, many deep edge intelligence applications require handling distributed data that cannot be transferred to a central server due to privacy concerns. Decentralized learning methods, such as federated learning, offer solutions where models are learned collectively by exchanging learned weights. However, they often require complex models that edge devices may not handle and multiple rounds of network communication to achieve state-of-the-art performances. This study proposes a convolutional ensemble learning approach, coined EdgeConvEns, that facilitates training heterogeneous weak models on edge and learning to ensemble them where data on edge are heterogeneously distributed. Edge models are implemented and trained independently on Field-Programmable Gate Array (FPGA) devices with various computational capacities. Learned data representations are transferred to a central server where the ensemble model is trained with the learned features received from the edge devices to boost the overall prediction performance. Extensive experiments demonstrate that the EdgeConvEns can outperform the state-of-the-art performance with fewer communications and less data in various training scenarios.

ARSep 30, 2020
An Embedded RISC-V Core with Fast Modular Multiplication

Ömer Faruk Irmak, Arda Yurdakul

One of the biggest concerns in IoT is privacy and security. Encryption and authentication need big power budgets, which battery-operated IoT end-nodes do not have. Hardware accelerators designed for specific cryptographic operations provide little to no flexibility for future updates. Custom instruction solutions are smaller in area and provide more flexibility for new methods to be implemented. One drawback of custom instructions is that the processor has to wait for the operation to finish. Eventually, the response time of the device to real-time events gets longer. In this work, we propose a processor with an extended custom instruction for modular multiplication, which blocks the processor, typically, two cycles for any size of modular multiplication when used in Partial Execution mode. We adopted embedded and compressed extensions of RISC-V for our proof-of-concept CPU. Our design is benchmarked on recent cryptographic algorithms in the field of elliptic-curve cryptography. Our CPU with 128-bit modular multiplication operates at 136MHz on ASIC and 81MHz on FPGA. It achieves up to 13x speed up on software implementations while reducing overall power consumption by up to 95\% with 41\% average area overhead over our base architecture.

CRSep 30, 2018
IDMoB: IoT Data Marketplace on Blockchain

Kazım Rıfat Özyılmaz, Mehmet Doğan, Arda Yurdakul

Today, Internet of Things (IoT) devices are the powerhouse of data generation with their ever-increasing numbers and widespread penetration. Similarly, artificial intelligence (AI) and machine learning (ML) solutions are getting integrated to all kinds of services, making products significantly more "smarter". The centerpiece of these technologies is "data". IoT device vendors should be able keep up with the increased throughput and come up with new business models. On the other hand, AI/ML solutions will produce better results if training data is diverse and plentiful. In this paper, we propose a blockchain-based, decentralized and trustless data marketplace where IoT device vendors and AI/ML solution providers may interact and collaborate. By facilitating a transparent data exchange platform, access to consented data will be democratized and the variety of services targeting end-users will increase. Proposed data marketplace is implemented as a smart contract on Ethereum blockchain and Swarm is used as the distributed storage platform.

CRSep 20, 2018
Designing a blockchain-based IoT infrastructure with Ethereum, Swarm and LoRa

Kazım Rıfat Özyılmaz, Arda Yurdakul

Today, the number of IoT devices in all aspects of life is exponentially increasing. The cities we are living in are getting smarter and informing us about our surroundings in a contextual manner. However, there lay significant challenges of deploying, managing and collecting data from these devices, in addition to the problem of storing and mining that data for higher-quality IoT services. Blockchain technology, even in today's nascent form, contains the pillars to create a common, distributed, trustless and autonomous infrastructure system. This paper describes a standardized IoT infrastructure; where data is stored on a DDOS-resistant, fault-tolerant, distributed storage service and data access is managed by a decentralized, trustless blockchain. The illustrated system used LoRa as the emerging network technology, Swarm as the distributed data storage and Ethereum as the blockchain platform. Such a data backend will ensure high availability with minimal security risks while replacing traditional backend systems with a single "smart contract".