Yiming Bian

2papers

2 Papers

19.6LGApr 22
Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling

Yiming Bian, Joshua M. Akey

The scalability of long-context large language models is fundamentally limited by the quadratic memory cost of exact self-attention, which often leads to out-of-memory (OOM) failures on modern hardware. Existing methods improve memory efficiency to near-linear complexity, while assuming that the full query, key, and value tensors fit in device memory. In this work, we remove this assumption by introducing CQS Divide, an operation derived from cyclic quorum sets (CQS) theory that decomposes attention into a set of independent subsequence computations whose recomposition yields exactly the same result as full-sequence attention. Exploiting this decomposition, we introduce Stream-CQSA, a memory-adaptive scheduling framework that partitions attention into subproblems that fit within arbitrary memory budgets. This recasts attention from a logically monolithic operation into a collection of schedulable tasks, enabling flexible execution across devices without inter-device communication. Experiments demonstrate predictable memory scaling and show that exact attention over billion-token sequences can be executed on a single GPU via streaming, without changing the underlying mathematical definition of attention or introducing approximation error.

LGMay 8, 2023
A LSTM and Cost-Sensitive Learning-Based Real-Time Warning for Civil Aviation Over-limit

Yiming Bian

The issue of over-limit during passenger aircraft flights has drawn increasing attention in civil aviation due to its potential safety risks. To address this issue, real-time automated warning systems are essential. In this study, a real-time warning model for civil aviation over-limit is proposed based on QAR data monitoring. Firstly, highly correlated attributes to over-limit are extracted from a vast QAR dataset using the Spearman rank correlation coefficient. Because flight over-limit poses a binary classification problem with unbalanced samples, this paper incorporates cost-sensitive learning in the LSTM model. Finally, the time step length, number of LSTM cells, and learning rate in the LSTM model are optimized using a grid search approach. The model is trained on a real dataset, and its performance is evaluated on a validation set. The experimental results show that the proposed model achieves an F1 score of 0.991 and an accuracy of 0.978, indicating its effectiveness in real-time warning of civil aviation over-limit.