Tergel Molom-Ochir

h-index2

4papers

6citations

Novelty66%

AI Score38

Ranked #85,418 of 194,257 authors (top 44%)#18,955 in LG (top 47%)

4 Papers

2.6LGJul 12, 2024

MonoSparse-CAM: Efficient Tree Model Processing via Monotonicity and Sparsity in CAMs

Tergel Molom-Ochir, Brady Taylor, Hai Li et al.

While the tree-based machine learning (TBML) models exhibit superior performance compared to neural networks on tabular data and hold promise for energy-efficient acceleration using aCAM arrays, their ideal deployment on hardware with explicit exploitation of TBML structure and aCAM circuitry remains a challenging task. In this work, we present MonoSparse-CAM, a new CAM-based optimization technique that exploits TBML sparsity and monotonicity in CAM circuitry to further advance processing performance. Our results indicate that MonoSparse-CAM reduces energy consumption by upto to 28.56x compared to raw processing and by 18.51x compared to state-of-the-art techniques, while improving the efficiency of computation by at least 1.68x.

2.3ARNov 24, 2025

CAMformer: Associative Memory is All You Need

Tergel Molom-Ochir, Benjamin F. Morris, Mark Horton et al.

Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys. We propose CAMformer, a novel accelerator that reinterprets attention as an associative memory operation and computes attention scores using a voltage-domain Binary Attention Content Addressable Memory (BA-CAM). This enables constant-time similarity search through analog charge sharing, replacing digital arithmetic with physical similarity sensing. CAMformer integrates hierarchical two-stage top-k filtering, pipelined execution, and high-precision contextualization to achieve both algorithmic accuracy and architectural efficiency. Evaluated on BERT and Vision Transformer workloads, CAMformer achieves over 10x energy efficiency, up to 4x higher throughput, and 6-8x lower area compared to state-of-the-art accelerators--while maintaining near-lossless accuracy.

7.1LGFeb 3, 2025

Hamming Attention Distillation: Binarizing Keys and Queries for Efficient Long-Context Transformers

Mark Horton, Tergel Molom-Ochir, Peter Liu et al.

Pre-trained transformer models with extended context windows are notoriously expensive to run at scale, often limiting real-world deployment due to their high computational and memory requirements. In this paper, we introduce Hamming Attention Distillation (HAD), a novel framework that binarizes keys and queries in the attention mechanism to achieve significant efficiency gains. By converting keys and queries into {-1, +1} vectors and replacing dot-product operations with efficient Hamming distance computations, our method drastically reduces computational overhead. Additionally, we incorporate attention matrix sparsification to prune low-impact activations, which further reduces the cost of processing long-context sequences. \par Despite these aggressive compression strategies, our distilled approach preserves a high degree of representational power, leading to substantially improved accuracy compared to prior transformer binarization methods. We evaluate HAD on a range of tasks and models, including the GLUE benchmark, ImageNet, and QuALITY, demonstrating state-of-the-art performance among binarized Transformers while drastically reducing the computational costs of long-context inference. \par We implement HAD in custom hardware simulations, demonstrating superior performance characteristics compared to a custom hardware implementation of standard attention. HAD achieves just $\mathbf{1.78}\%$ performance losses on GLUE compared to $9.08\%$ in state-of-the-art binarization work, and $\mathbf{2.5}\%$ performance losses on ImageNet compared to $12.14\%$, all while targeting custom hardware with a $\mathbf{79}\%$ area reduction and $\mathbf{87}\%$ power reduction compared to its standard attention counterpart.

2.3SPFeb 7, 2021

WiSleep: Inferring Sleep Duration at Scale Using Passive WiFi Sensing

Priyanka Mary Mammen, Camellia Zakaria, Tergel Molom-Ochir et al.

Sleep deprivation is a public health concern that significantly impacts one's well-being and performance. Sleep is an intimate experience, and state-of-the-art sleep monitoring solutions are highly-personalized to individual users. With a motivation to expand sleep monitoring capabilities at a large scale and contribute sleep data to public health understanding, we present Wisleep, a system for inferring sleep duration using smartphone network connections that are passively sensed from WiFi infrastructure. We propose an unsupervised ensemble model of Bayesian change point detection, validating it over a user study among 20 students living in campus dormitories and a private home. Our results find Wisleep outperforming prior techniques for users with irregular sleep patterns while yielding an average 88.50% accuracy within 60 minutes sleep time error and 39 minutes wake-up time error. This is comparable to client-side methods, albeit utilizing coarse-grained information. Additionally, we utilize our approach to predict sleep and wake-up times from a user study of more than 1000 student users, demonstrating results similar to prior findings on students' sleep patterns. Finally, we show that Wisleep can process data from twenty thousand users on a single commodity server, allowing it to scale to large campus populations with low server requirements.