PLNov 20, 2025
Operon: Incremental Construction of Ragged Data via Named DimensionsSungbin Moon, Jiho Park, Suyoung Hwang et al.
Modern data processing workflows frequently encounter ragged data: collections with variable-length elements that arise naturally in domains like natural language processing, scientific measurements, and autonomous AI agents. Existing workflow engines lack native support for tracking the shapes and dependencies inherent to ragged data, forcing users to manage complex indexing and dependency bookkeeping manually. We present Operon, a Rust-based workflow engine that addresses these challenges through a novel formalism of named dimensions with explicit dependency relations. Operon provides a domain-specific language where users declare pipelines with dimension annotations that are statically verified for correctness, while the runtime system dynamically schedules tasks as data shapes are incrementally discovered during execution. We formalize the mathematical foundation for reasoning about partial shapes and prove that Operon's incremental construction algorithm guarantees deterministic and confluent execution in parallel settings. The system's explicit modeling of partially-known states enables robust persistence and recovery mechanisms, while its per-task multi-queue architecture achieves efficient parallelism across heterogeneous task types. Empirical evaluation demonstrates that Operon outperforms an existing workflow engine with 14.94x baseline overhead reduction while maintaining near-linear end-to-end output rates as workloads scale, making it particularly suitable for large-scale data generation pipelines in machine learning applications.
AIAug 25, 2025
Spacer: Towards Engineered Scientific InspirationMinhyeong Lee, Suyoung Hwang, Seunghyun Moon et al.
Recent advances in LLMs have made automated scientific research the next frontline in the path to artificial superintelligence. However, these systems are bound either to tasks of narrow scope or the limited creative capabilities of LLMs. We propose Spacer, a scientific discovery system that develops creative and factually grounded concepts without external intervention. Spacer attempts to achieve this via 'deliberate decontextualization,' an approach that disassembles information into atomic units - keywords - and draws creativity from unexplored connections between them. Spacer consists of (i) Nuri, an inspiration engine that builds keyword sets, and (ii) the Manifesting Pipeline that refines these sets into elaborate scientific statements. Nuri extracts novel, high-potential keyword sets from a keyword graph built with 180,000 academic publications in biological fields. The Manifesting Pipeline finds links between keywords, analyzes their logical structure, validates their plausibility, and ultimately drafts original scientific concepts. According to our experiments, the evaluation metric of Nuri accurately classifies high-impact publications with an AUROC score of 0.737. Our Manifesting Pipeline also successfully reconstructs core concepts from the latest top-journal articles solely from their keyword sets. An LLM-based scoring system estimates that this reconstruction was sound for over 85% of the cases. Finally, our embedding space analysis shows that outputs from Spacer are significantly more similar to leading publications compared with those from SOTA LLMs.
ARMar 1, 2025
T-REX: A 68-567 μs/token, 0.41-3.95 μJ/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFETSeunghyun Moon, Mao Li, Gregory Chen et al.
This work introduces novel training and post-training compression schemes to reduce external memory access during transformer model inference. Additionally, a new control flow mechanism, called dynamic batching, and a novel buffer architecture, termed a two-direction accessible register file, further reduce external memory access while improving hardware utilization.
MTRL-SCIAug 28, 2021
Impact of Surface and Pore Characteristics on Fatigue Life of Laser Powder Bed Fusion Ti-6Al-4V Alloy Described by Neural Network ModelsSeunghyun Moon, Ruimin Ma, Ross Attardo et al.
In this study, the effects of surface roughness and pore characteristics on fatigue lives of laser powder bed fusion (LPBF) Ti-6Al-4V parts were investigated. The 197 fatigue bars were printed using the same laser power but with varied scanning speeds. These actions led to variations in the geometries of microscale pores, and such variations were characterized using micro-computed tomography. To generate differences in surface roughness in fatigue bars, half of the samples were grit-blasted and the other half machined. Fatigue behaviors were analyzed with respect to surface roughness and statistics of the pores. For the grit-blasted samples, the contour laser scan in the LPBF strategy led to a pore-depletion zone isolating surface and internal pores with different features. For the machined samples, where surface pores resemble internal pores, the fatigue life was highly correlated with the average pore size and projected pore area in the plane perpendicular to the stress direction. Finally, a machine learning model using a drop-out neural network (DONN) was employed to establish a link between surface and pore features to the fatigue data (logN), and good prediction accuracy was demonstrated. Besides predicting fatigue lives, the DONN can also estimate the prediction uncertainty.