Konstantinos Balaskas

h-index5

7papers

46citations

Novelty50%

AI Score40

Ranked #72,266 of 194,257 authors (top 37%)#256 in AR (top 40%)

7 Papers

3.3ARMar 15, 2022

Approximate Decision Trees For Machine Learning Classification on Tiny Printed Circuits

Konstantinos Balaskas, Georgios Zervakis, Kostas Siozios et al.

Although Printed Electronics (PE) cannot compete with silicon-based systems in conventional evaluation metrics, e.g., integration density, area and performance, PE offers attractive properties such as on-demand ultra-low-cost fabrication, flexibility and non-toxicity. As a result, it targets application domains that are untouchable by lithography-based silicon electronics and thus have not yet seen much proliferation of computing. However, despite the attractive characteristics of PE, the large feature sizes in PE prohibit the realization of complex printed circuits, such as Machine Learning (ML) classifiers. In this work, we exploit the hardware-friendly nature of Decision Trees for machine learning classification and leverage the hardware-efficiency of the approximate design in order to generate approximate ML classifiers that are suitable for tiny, ultra-resource constrained, and battery-powered printed applications.

7.0ARApr 17

Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition

José Juan Hernández Morales, Georgios Mentzos, Frank Hannig et al.

The paradigm shift towards local and on-device inference under stringent resource constraints is represented by the tiny machine learning (TinyML) domain. The primary goal of \gls{tml} is to integrate intelligence into tiny, low-cost devices under strict resource, energy, and latency constraints. However, the ultra-resource-constrained nature of these devices can lead to increased inference execution time, which can be detrimental in latency critical applications. At the same time, TinyML applications are often associated with sensitive data. As such, latency optimization approaches that rely on training samples are infeasible when such data is unavailable, proprietary, or sensitive, highlighting a pressing need for optimization approaches that do not require access to the training dataset and can be applied directly to pre-trained models. Replacing costly multiplications with more hardware-efficient operations, such as shifts and additions, has been proposed as an effective method for reducing inference latency. However, post-training power-of-two (Po2) approaches are scarce and, in many cases, lead to unacceptable accuracy loss. In this work, we propose a framework that applies approximate matrix decomposition to a given CNN in order to optimize hardware implementations subject to strict constraints and without any need of re-training or fine-tuning steps. The genetic algorithm-driven framework explores different matrix decompositions and resulting multiplier-less CNN accelerator designs for FPGA targets. A comprehensive evaluation of different TinyML benchmarks demonstrates our framework's efficacy in generating latency-optimized implementations that satisfy strict accuracy and resource constraints, achieving an average 33% latency improvement with an average accuracy loss of 1.3% compared to typical systolic array-based FPGA accelerators.

4.1LGJan 28, 2025

Late Breaking Results: Energy-Efficient Printed Machine Learning Classifiers with Sequential SVMs

Spyridon Besias, Ilias Sertaridis, Florentia Afentaki et al.

Printed Electronics (PE) provide a mechanically flexible and cost-effective solution for machine learning (ML) circuits, compared to silicon-based technologies. However, due to large feature sizes, printed classifiers are limited by high power, area, and energy overheads, which restricts the realization of battery-powered systems. In this work, we design sequential printed bespoke Support Vector Machine (SVM) circuits that adhere to the power constraints of existing printed batteries while minimizing energy consumption, thereby boosting battery life. Our results show 6.5x energy savings while maintaining higher accuracy compared to the state of the art.

4.1LGFeb 3, 2025

Compact Yet Highly Accurate Printed Classifiers Using Sequential Support Vector Machine Circuits

Ilias Sertaridis, Spyridon Besias, Florentia Afentaki et al.

Printed Electronics (PE) technology has emerged as a promising alternative to silicon-based computing. It offers attractive properties such as on-demand ultra-low-cost fabrication, mechanical flexibility, and conformality. However, PE are governed by large feature sizes, prohibiting the realization of complex printed Machine Learning (ML) classifiers. Leveraging PE's ultra-low non-recurring engineering and fabrication costs, designers can fully customize hardware to a specific ML model and dataset, significantly reducing circuit complexity. Despite significant advancements, state-of-the-art solutions achieve area efficiency at the expense of considerable accuracy loss. Our work mitigates this by designing area- and power-efficient printed ML classifiers with little to no accuracy degradation. Specifically, we introduce the first sequential Support Vector Machine (SVM) classifiers, exploiting the hardware efficiency of bespoke control and storage units and a single Multiply-Accumulate compute engine. Our SVMs yield on average 6x lower area and 4.6% higher accuracy compared to the printed state of the art.

1.2ARApr 14, 2025

Carbon-Efficient 3D DNN Acceleration: Optimizing Performance and Sustainability

Aikaterini Maria Panteleaki, Konstantinos Balaskas, Georgios Zervakis et al.

As Deep Neural Networks (DNNs) continue to drive advancements in artificial intelligence, the design of hardware accelerators faces growing concerns over embodied carbon footprint due to complex fabrication processes. 3D integration improves performance but introduces sustainability challenges, making carbon-aware optimization essential. In this work, we propose a carbon-efficient design methodology for 3D DNN accelerators, leveraging approximate computing and genetic algorithm-based design space exploration to optimize Carbon Delay Product (CDP). By integrating area-efficient approximate multipliers into Multiply-Accumulate (MAC) units, our approach effectively reduces silicon area and fabrication overhead while maintaining high computational accuracy. Experimental evaluations across three technology nodes (45nm, 14nm, and 7nm) show that our method reduces embodied carbon by up to 30% with negligible accuracy drop.

2.3SPAug 27, 2025

Invited Paper: Feature-to-Classifier Co-Design for Mixed-Signal Smart Flexible Wearables for Healthcare at the Extreme Edge

Maha Shatta, Konstantinos Balaskas, Paula Carolina Lozano Duarte et al.

Flexible Electronics (FE) offer a promising alternative to rigid silicon-based hardware for wearable healthcare devices, enabling lightweight, conformable, and low-cost systems. However, their limited integration density and large feature sizes impose strict area and power constraints, making ML-based healthcare systems-integrating analog frontend, feature extraction and classifier-particularly challenging. Existing FE solutions often neglect potential system-wide solutions and focus on the classifier, overlooking the substantial hardware cost of feature extraction and Analog-to-Digital Converters (ADCs)-both major contributors to area and power consumption. In this work, we present a holistic mixed-signal feature-to-classifier co-design framework for flexible smart wearable systems. To the best of our knowledge, we design the first analog feature extractors in FE, significantly reducing feature extraction cost. We further propose an hardware-aware NAS-inspired feature selection strategy within ML training, enabling efficient, application-specific designs. Our evaluation on healthcare benchmarks shows our approach delivers highly accurate, ultra-area-efficient flexible systems-ideal for disposable, low-power wearable monitoring.

12.3LGDec 23, 2023

Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Konstantinos Balaskas, Andreas Karatzas, Christos Sad et al.

Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves $39\%$ average energy reduction for $1.7\%$ average accuracy loss and outperforms significantly the state-of-the-art approaches.