Georgios Papandroulidakis

h-index22
2papers

2 Papers

ARAug 12, 2025
OISMA: On-the-fly In-memory Stochastic Multiplication Architecture for Matrix-Multiplication Workloads

Shady Agwa, Yihan Pan, Georgios Papandroulidakis et al.

Artificial Intelligence models are currently driven by a significant up-scaling of their complexity, with massive matrix multiplication workloads representing the major computational bottleneck. In-memory computing architectures are proposed to avoid the Von Neumann bottleneck. However, both digital/binary-based and analogue in-memory computing architectures suffer from various limitations, which significantly degrade the performance and energy efficiency gains. This work proposes OISMA, a novel in-memory computing architecture that utilizes the computational simplicity of a quasi-stochastic computing domain (Bent-Pyramid system), while keeping the same efficiency, scalability, and productivity of digital memories. OISMA converts normal memory read operations into in-situ stochastic multiplication operations with a negligible cost. An accumulation periphery then accumulates the output multiplication bitstreams, achieving the matrix multiplication functionality. Extensive matrix multiplication benchmarking was conducted to analyze the accuracy of the Bent-Pyramid system, using matrix dimensions ranging from 4x4 to 512x512. The accuracy results show a significant decrease in the average relative Frobenius error, from 9.42% (for 4x4) to 1.81% (for 512x512), compared to 64-bit double precision floating-point format. A 1T1R OISMA array of 4 KB capacity was implemented using a commercial 180nm technology node and in-house RRAM technology. At 50 MHz, OISMA achieves 0.891 TOPS/W and 3.98 GOPS/mm2 for energy and area efficiency, respectively, occupying an effective computing area of 0.804241 mm2. Scaling OISMA from 180nm to 22nm technology shows a significant improvement of two orders of magnitude in energy efficiency and one order of magnitude in area efficiency, compared to dense matrix multiplication in-memory computing architectures.

LGFeb 14, 2025
A Hybrid Edge Classifier: Combining TinyML-Optimised CNN with RRAM-CMOS ACAM for Energy-Efficient Inference

Kieran Woodward, Eiman Kanjo, Georgios Papandroulidakis et al.

In recent years, the development of smart edge computing systems to process information locally is on the rise. Many near-sensor machine learning (ML) approaches have been implemented to introduce accurate and energy efficient template matching operations in resource-constrained edge sensing systems, such as wearables. To introduce novel solutions that can be viable for extreme edge cases, hybrid solutions combining conventional and emerging technologies have started to be proposed. Deep Neural Networks (DNN) optimised for edge application alongside new approaches of computing (both device and architecture -wise) could be a strong candidate in implementing edge ML solutions that aim at competitive accuracy classification while using a fraction of the power of conventional ML solutions. In this work, we are proposing a hybrid software-hardware edge classifier aimed at the extreme edge near-sensor systems. The classifier consists of two parts: (i) an optimised digital tinyML network, working as a front-end feature extractor, and (ii) a back-end RRAM-CMOS analogue content addressable memory (ACAM), working as a final stage template matching system. The combined hybrid system exhibits a competitive trade-off in accuracy versus energy metric with $E_{front-end}$ = $96.23 nJ$ and $E_{back-end}$ = $1.45 nJ$ for each classification operation compared with 78.06$μ$J for the original teacher model, representing a 792-fold reduction, making it a viable solution for extreme edge applications.