Seungwoo Hong

h-index11
2papers

2 Papers

2.4NIMay 21
Latency in Real-Time 3D Volumetric Streaming: A Comprehensive Study

Seungwoo Hong, Hosun Yoon, Seong Moon et al.

Real-time 3D volumetric streaming is a transformative technology that enables the seamless transmission and rendering of high-fidelity 3D models, enhancing applications in virtual reality (VR), augmented reality (AR), gaming, telepresence, and remote collaboration. However, latency remains a major challenge, affecting immersion, causing motion sickness, and disrupting real-time interactions. Addressing these latency issues is essential for improving user experience and ensuring system efficiency. This study conducts a comprehensive latency measurement and analysis within a real-time volumetric streaming environment. We systematically break down the streaming process into three key layers: the application layer, the transport protocol layer, and the network layer. By evaluating each layer in a real-world system, we identify latency bottlenecks, quantify their impact, and uncover the underlying causes of delay. Based on these findings, we propose targeted optimization strategies to mitigate latency and enhance system responsiveness. Through this research, we establish best practices and innovative solutions to improve the efficiency, scalability, and overall user experience of real-time 3D volumetric streaming. Our insights contribute to advancing the field, paving the way for more immersive and responsive digital environments.

ARDec 13, 2024
Panacea: Novel DNN Accelerator using Accuracy-Preserving Asymmetric Quantization and Energy-Saving Bit-Slice Sparsity

Dongyun Kam, Myeongji Yun, Sunwoo Yoo et al.

Low bit-precisions and their bit-slice sparsity have recently been studied to accelerate general matrix-multiplications (GEMM) during large-scale deep neural network (DNN) inferences. While the conventional symmetric quantization facilitates low-resolution processing with bit-slice sparsity for both weight and activation, its accuracy loss caused by the activation's asymmetric distributions cannot be acceptable, especially for large-scale DNNs. In efforts to mitigate this accuracy loss, recent studies have actively utilized asymmetric quantization for activations without requiring additional operations. However, the cutting-edge asymmetric quantization produces numerous nonzero slices that cannot be compressed and skipped by recent bit-slice GEMM accelerators, naturally consuming more processing energy to handle the quantized DNN models. To simultaneously achieve high accuracy and hardware efficiency for large-scale DNN inferences, this paper proposes an Asymmetrically-Quantized bit-Slice GEMM (AQS-GEMM) for the first time. In contrast to the previous bit-slice computing, which only skips operations of zero slices, the AQS-GEMM compresses frequent nonzero slices, generated by asymmetric quantization, and skips their operations. To increase the slice-level sparsity of activations, we also introduce two algorithm-hardware co-optimization methods: a zero-point manipulation and a distribution-based bit-slicing. To support the proposed AQS-GEMM and optimizations at the hardware-level, we newly introduce a DNN accelerator, Panacea, which efficiently handles sparse/dense workloads of the tiled AQS-GEMM to increase data reuse and utilization. Panacea supports a specialized dataflow and run-length encoding to maximize data reuse and minimize external memory accesses, significantly improving its hardware efficiency. Our benchmark evaluations show Panacea outperforms existing DNN accelerators.