NIMar 12, 2024
Towards a Dynamic Future with Adaptable Computing and Network Convergence (ACNC)Masoud Shokrnezhad, Hao Yu, Tarik Taleb et al.
In the context of advancing 6G, a substantial paradigm shift is anticipated, highlighting comprehensive everything-to-everything interactions characterized by numerous connections and stringent adherence to Quality of Service/Experience (QoS/E) prerequisites. The imminent challenge stems from resource scarcity, prompting a deliberate transition to Computing-Network Convergence (CNC) as an auspicious approach for joint resource orchestration. While CNC-based mechanisms have garnered attention, their effectiveness in realizing future services, particularly in use cases like the Metaverse, may encounter limitations due to the continually changing nature of users, services, and resources. Hence, this paper presents the concept of Adaptable CNC (ACNC) as an autonomous Machine Learning (ML)-aided mechanism crafted for the joint orchestration of computing and network resources, catering to dynamic and voluminous user requests with stringent requirements. ACNC encompasses two primary functionalities: state recognition and context detection. Given the intricate nature of the user-service-computing-network space, the paper employs dimension reduction to generate live, holistic, abstract system states in a hierarchical structure. To address the challenges posed by dynamic changes, Continual Learning (CL) is employed, classifying the system state into contexts controlled by dedicated ML agents, enabling them to operate efficiently. These two functionalities are intricately linked within a closed loop overseen by the End-to-End (E2E) orchestrator to allocate resources. The paper introduces the components of ACNC, proposes a Metaverse scenario to exemplify ACNC's role in resource provisioning with Segment Routing v6 (SRv6), outlines ACNC's workflow, details a numerical analysis for efficiency assessment, and concludes with discussions on relevant challenges and potential avenues for future research.
LGSep 17, 2025
SBVR: Summation of BitVector Representation for Efficient LLM QuantizationWonjun Bang, Jongseok Park, Hongseung Yu et al.
With the advent of large language models (LLMs), numerous Post-Training Quantization (PTQ) strategies have been proposed to alleviate deployment barriers created by their enormous parameter counts. Quantization achieves compression by limiting the number of representable points in the data. Therefore, the key to achieving efficient quantization is selecting the optimal combination of representation points, or codes, for the given data. Existing PTQ solutions adopt two major approaches to this problem: Round-To-Nearest (RTN)-based methods and codebook-based methods. RTN-based methods map LLM weights onto uniformly distributed integer grids, failing to account for the Gaussian-like weight distribution of LLM weights. Codebook-based methods mitigate this issue by constructing distribution-aware codebooks; however, they suffer from random and strided memory access patterns, resulting in degraded inference speed that is exacerbated by the limited size of GPU L1 cache. To overcome these limitations, we propose a novel LLM quantization method, SBVR (Summation of BitVector Representation), that enables Gaussian-like code representation in a hardware-friendly manner for fast inference. SBVR maps weight values to non-uniform representation points whose distribution follows the actual distribution of LLM weights, enabling more accurate compression. Additionally, we design a custom CUDA kernel that allows matrix-vector multiplication directly in the SBVR format without decompression, thereby enabling high-performance execution of SBVR-compressed models. Our evaluations of SBVR on various models demonstrate state-of-the-art perplexity and accuracy benchmark performance while delivering a 2.21x- 3.04x end-to-end token-generation speedup over naive FP16 models in the 4-bit quantization regime.