Yubin Lee

AR
h-index27
3papers
29citations
Novelty38%
AI Score36

3 Papers

68.4CYMar 16
Spatial Disparities in Fire Shelter Accessibility: Capacity Challenges in the Palisades and Eaton Fires

Su Yeon Han, Yubin Lee, Jooyoung Yoo et al.

The increasing frequency and severity of wildfire in California, exacerbated by prolonged drought and environmental changes, pose significant challenges to urban community resilience and equitable emergency response. The study investigates issues of accessibility to shelters during the Palisades and Eaton Fires which started in January 2025 in Southern California that led to over 180,000 displacements and the loss of 16,000 structures. Despite coordinated efforts of many organizations' emergency assistance, shelter shortages left many evacuees without safety or accessible refuge. This research aims to measure shelter accessibility during the fires' peak, evaluate whether existing shelter capacity met the demand, and identify spatial disparities in access. Findings reveal severe shelter shortages and pronounced inequities in access to shelters, particularly in geographically isolated regions and mountainous areas. To address these challenges, we implemented shelter placement strategies using both capacity-based and distance-based approaches, demonstrating potential improvements in accessibility and equity. The findings underscore the critical need for strategic shelter planning and infrastructure development to enhance disaster readiness and reduce vulnerability in regions that frequently experience wildfires.

ARMar 21, 2024
DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

Yoonsung Kim, Changhun Oh, Jinwoo Hwang et al.

Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a larger "teacher" model for labeling sampled data (labeling), and continuously retrains the student model to adapt to changing scenarios (retraining). This paper highlights the limitations in state-of-the-art continuous learning systems: (1) they focus on computations for retraining, while overlooking the compute needs for inference and labeling, (2) they rely on power-hungry GPUs, unsuitable for battery-operated autonomous systems, and (3) they are located on a remote centralized server, intended for multi-tenant scenarios, again unsuitable for autonomous systems due to privacy, network availability, and latency concerns. We propose a hardware-algorithm co-designed solution for continuous learning, DaCapo, that enables autonomous systems to perform concurrent executions of inference, labeling, and training in a performant and energy-efficient manner. DaCapo comprises (1) a spatially-partitionable and precision-flexible accelerator enabling parallel execution of kernels on sub-accelerators at their respective precisions, and (2) a spatiotemporal resource allocation algorithm that strategically navigates the resource-accuracy tradeoff space, facilitating optimal decisions for resource allocation to achieve maximal accuracy. Our evaluation shows that DaCapo achieves 6.5% and 5.5% higher accuracy than a state-of-the-art GPU-based continuous learning systems, Ekya and EOMU, respectively, while consuming 254x less power.

ARJul 14, 2025
Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving

Wonung Kim, Yubin Lee, Yoonsung Kim et al.

Transformers are the driving force behind today's Large Language Models (LLMs), serving as the foundation for their performance and versatility. Yet, their compute and memory costs grow with sequence length, posing scalability challenges for long-context inferencing. In response, the algorithm community is exploring alternative architectures, such as state space models (SSMs), linear attention, and recurrent neural networks (RNNs), which we refer to as post-transformers. This shift presents a key challenge: building a serving system that efficiently supports both transformer and post-transformer LLMs within a unified framework. To address this challenge, we analyze the performance characteristics of transformer and post-transformer LLMs. Despite their algorithmic differences, both are fundamentally limited by memory bandwidth under batched inference due to attention in transformers and state updates in post-transformers. Further analyses suggest two additional insights: (1) state update operations, unlike attention, incur high hardware cost, making per-bank PIM acceleration inefficient, and (2) different low-precision arithmetic methods offer varying accuracy-area tradeoffs, while we identify Microsoft's MX as the Pareto-optimal choice. Building on these insights, we design Pimba as an array of State-update Processing Units (SPUs), each shared between two banks to enable interleaved access to PIM. Each SPU includes a State-update Processing Engine (SPE) that comprises element-wise multipliers and adders using MX-based quantized arithmetic, enabling efficient execution of state update and attention operations. Our evaluation shows that, compared to LLM-optimized GPU and GPU+PIM systems, Pimba achieves up to 4.1x and 2.1x higher token generation throughput, respectively.