LG PFMar 21, 2025

V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms

Javier J. Poveda Rodrigo, Mohamed Amine Ahmdi, Alessio Burrello, Daniele Jahier Pagliari, Luca Benini

arXiv:2503.17422v1h-index: 22

Originality Synthesis-oriented

AI Analysis

This work addresses the need for efficient LLM deployment on emerging open-hardware RISC-V platforms, offering a domain-specific optimization that is incremental in nature.

The paper tackles the problem of optimizing LLM inference on RISC-V CPUs, achieving speedups of up to 3.0x for token generation and prompt processing on specific models like DeepSeek R1 Distill Llama 8B and QWEN 14B.

The recent exponential growth of Large Language Models (LLMs) has relied on GPU-based systems. However, CPUs are emerging as a flexible and lower-cost alternative, especially when targeting inference and reasoning workloads. RISC-V is rapidly gaining traction in this area, given its open and vendor-neutral ISA. However, the RISC-V hardware for LLM workloads and the corresponding software ecosystem are not fully mature and streamlined, given the requirement of domain-specific tuning. This paper aims at filling this gap, focusing on optimizing LLM inference on the Sophon SG2042, the first commercially available many-core RISC-V CPU with vector processing capabilities. On two recent state-of-the-art LLMs optimized for reasoning, DeepSeek R1 Distill Llama 8B and DeepSeek R1 Distill QWEN 14B, we achieve 4.32/2.29 token/s for token generation and 6.54/3.68 token/s for prompt processing, with a speed up of up 2.9x/3.0x compared to our baseline.

View on arXiv PDF

Similar