Changhun Oh

h-index13

5papers

55citations

Novelty51%

AI Score40

Ranked #73,796 of 194,257 authors (top 38%)#262 in AR (top 41%)

5 Papers

8.6QUANT-PHSep 23, 2023

Tight bounds on Pauli channel learning without entanglement

Senrui Chen, Changhun Oh, Sisi Zhou et al.

Quantum entanglement is a crucial resource for learning properties from nature, but a precise characterization of its advantage can be challenging. In this work, we consider learning algorithms without entanglement to be those that only utilize states, measurements, and operations that are separable between the main system of interest and an ancillary system. Interestingly, we show that these algorithms are equivalent to those that apply quantum circuits on the main system interleaved with mid-circuit measurements and classical feedforward. Within this setting, we prove a tight lower bound for Pauli channel learning without entanglement that closes the gap between the best-known upper and lower bound. In particular, we show that $Θ(2^n\varepsilon^{-2})$ rounds of measurements are required to estimate each eigenvalue of an $n$-qubit Pauli channel to $\varepsilon$ error with high probability when learning without entanglement. In contrast, a learning algorithm with entanglement only needs $Θ(\varepsilon^{-2})$ copies of the Pauli channel. The tight lower bound strengthens the foundation for an experimental demonstration of entanglement-enhanced advantages for Pauli noise characterization.

8.0ARMar 21, 2024Code

DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

Yoonsung Kim, Changhun Oh, Jinwoo Hwang et al.

Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a larger "teacher" model for labeling sampled data (labeling), and continuously retrains the student model to adapt to changing scenarios (retraining). This paper highlights the limitations in state-of-the-art continuous learning systems: (1) they focus on computations for retraining, while overlooking the compute needs for inference and labeling, (2) they rely on power-hungry GPUs, unsuitable for battery-operated autonomous systems, and (3) they are located on a remote centralized server, intended for multi-tenant scenarios, again unsuitable for autonomous systems due to privacy, network availability, and latency concerns. We propose a hardware-algorithm co-designed solution for continuous learning, DaCapo, that enables autonomous systems to perform concurrent executions of inference, labeling, and training in a performant and energy-efficient manner. DaCapo comprises (1) a spatially-partitionable and precision-flexible accelerator enabling parallel execution of kernels on sub-accelerators at their respective precisions, and (2) a spatiotemporal resource allocation algorithm that strategically navigates the resource-accuracy tradeoff space, facilitating optimal decisions for resource allocation to achieve maximal accuracy. Our evaluation shows that DaCapo achieves 6.5% and 5.5% higher accuracy than a state-of-the-art GPU-based continuous learning systems, Ekya and EOMU, respectively, while consuming 254x less power.

3.3ARApr 11, 2025

MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization

Daeun Kim, Jinwoo Hwang, Changhun Oh et al.

Diffusion Transformer (DiT) has driven significant progress in image generation tasks. However, DiT inferencing is notoriously compute-intensive and incurs long latency even on datacenter-scale GPUs, primarily due to its iterative nature and heavy reliance on GEMM operations inherent to its encoder-based structure. To address the challenge, prior work has explored quantization, but achieving low-precision quantization for DiT inferencing with both high accuracy and substantial speedup remains an open problem. To this end, this paper proposes MixDiT, an algorithm-hardware co-designed acceleration solution that exploits mixed Microscaling (MX) formats to quantize DiT activation values. MixDiT quantizes the DiT activation tensors by selectively applying higher precision to magnitude-based outliers, which produce mixed-precision GEMM operations. To achieve tangible speedup from the mixed-precision arithmetic, we design a MixDiT accelerator that enables precision-flexible multiplications and efficient MX precision conversions. Our experimental results show that MixDiT delivers a speedup of 2.10-5.32 times over RTX 3090, with no loss in FID.

2.3ARNov 17, 2025

Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration

Changhun Oh, Seongryong Oh, Jinwoo Hwang et al.

3D Gaussian Splatting (3DGS) rendering in real-time on resource-constrained devices is essential for delivering immersive augmented and virtual reality (AR/VR) experiences. However, existing solutions struggle to achieve high frame rates, especially for high-resolution rendering. Our analysis identifies the sorting stage in the 3DGS rendering pipeline as the major bottleneck due to its high memory bandwidth demand. This paper presents Neo, which introduces a reuse-and-update sorting algorithm that exploits temporal redundancy in Gaussian ordering across consecutive frames, and devises a hardware accelerator optimized for this algorithm. By efficiently tracking and updating Gaussian depth ordering instead of re-sorting from scratch, Neo significantly reduces redundant computations and memory bandwidth pressure. Experimental results show that Neo achieves up to 10.0x and 5.6x higher throughput than state-of-the-art edge GPU and ASIC solution, respectively, while reducing DRAM traffic by 94.5% and 81.3%. These improvements make high-quality and low-latency on-device 3D rendering more practical.

1.2COMP-PHOct 17, 2025Code

Towards Symmetry-Aware Efficient Simulation of Quantum Systems and Beyond

Min Chen, Minzhao Liu, Changhun Oh et al.

The efficient simulation of complex quantum systems remains a central challenge due to the exponential growth of Hilbert space with system size. Tensor network methods have long been established as powerful approximation schemes, and their efficiency can be further enhanced by incorporating physics-informed priors. A prominent example is symmetry: recent progress on $U(1)$-symmetric tensor networks, accelerated on GPUs and scaled to supercomputers, shows how conserved charges induce block-sparse structures that reduce computational cost and enable larger simulations. The same principle extends to general symmetries, inspiring equivariant neural networks in machine learning and guiding symmetry-preserving ansatze in variational quantum algorithms. Beyond symmetry, physics-informed design also includes strategies such as hybrid tensor networks and parallel sequential circuits, which pursue efficiency from complementary principles. This Perspective argues that physics-informed tensor networks, grounded in both symmetry and beyond-symmetry insights, provide unifying strategies for scalable approaches in quantum simulation, computation, and machine learning.