ARDec 3, 2025
KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash ComputingLishuo Deng, Shaojie Xu, Jinwu Chen et al.
Deploying large language models (LLMs) on edge devices enables personalized agents with strong privacy and low cost. However, with tens to hundreds of billions of parameters, single-batch autoregressive inference suffers from extremely low arithmetic intensity, creating severe weight-loading and bandwidth pressures on resource-constrained platforms. Recent in-flash computing (IFC) solutions alleviate this bottleneck by co-locating weight-related linear computations in the decode phase with flash, yet still rely on DRAM for the key-value (KV) cache. As context length grows, the KV cache can exceed model weights in size, imposing prohibitive DRAM cost and capacity requirements. Attempts to offload KV cache to flash suffer from severe performance penalties. We propose KVNAND, the first DRAM-free, IFC-based architecture that stores both model weights and KV cache entirely in compute-enabled 3D NAND flash. KVNAND addresses the fundamental performance challenges of flash under intensive KV cache access by leveraging IFC for all memory-bound operations to reduce data transfer overhead, introducing head-group parallelism to boost throughput, and employing page-level KV cache mapping to align token access patterns with flash organization. In addition, we propose a design space exploration framework that evaluates discrete and compact KVNAND variants to balance weight and KV placement, automatically identifying the optimal design trade-off. These techniques mitigate latency, energy, and reliability concerns, turning flash into a practical medium for long-context KV storage. Evaluations on MHA 7B and GQA 70B LLMs show that KVNAND achieves 1.98\(\times\)/1.94\(\times\)/2.05\(\times\) geomean speedup at 128/1K/10K-token contexts compared to DRAM-equipped IFC designs and addresses out-of-memory failures at 100K context length.
LGAug 23, 2024
Robust Predictions with Ambiguous Time Delays: A Bootstrap StrategyJiajie Wang, Zhiyuan Jerry Lin, Wen Chen
In contemporary data-driven environments, the generation and processing of multivariate time series data is an omnipresent challenge, often complicated by time delays between different time series. These delays, originating from a multitude of sources like varying data transmission dynamics, sensor interferences, and environmental changes, introduce significant complexities. Traditional Time Delay Estimation methods, which typically assume a fixed constant time delay, may not fully capture these variabilities, compromising the precision of predictive models in diverse settings. To address this issue, we introduce the Time Series Model Bootstrap (TSMB), a versatile framework designed to handle potentially varying or even nondeterministic time delays in time series modeling. Contrary to traditional approaches that hinge on the assumption of a single, consistent time delay, TSMB adopts a nonparametric stance, acknowledging and incorporating time delay uncertainties. TSMB significantly bolsters the performance of models that are trained and make predictions using this framework, making it highly suitable for a wide range of dynamic and interconnected data environments.
LGFeb 15, 2025
Exploring the Role of Artificial Intelligence and Machine Learning in Process Optimization for Chemical IndustryZishuo Lin, Jiajie Wang, Zhe Yan et al.
The crucial field of Optical Chemical Structure Recognition (OCSR) aims to transform chemical structure photographs into machine-readable formats so that chemical databases may be efficiently stored and queried. Although a number of OCSR technologies have been created, little is known about how well they work in different picture deterioration scenarios. In this work, a new dataset of chemically structured images that have been systematically harmed graphically by compression, noise, distortion, and black overlays is presented. On these subsets, publicly accessible OCSR tools were thoroughly tested to determine how resilient they were to unfavorable circumstances. The outcomes show notable performance variation, underscoring each tool's advantages and disadvantages. Interestingly, MolScribe performed best under heavy compression (55.8% at 99%) and had the highest identification rate on undamaged photos (94.6%). MolVec performed exceptionally well against noise and black overlay (86.8% at 40%), although it declined under extreme distortion (<70%). With recognition rates below 30%, Decimer demonstrated strong sensitivity to noise and black overlay, but Imago had the lowest baseline accuracy (73.6%). The creative assessment of this study offers important new information about how well the OCSR tool performs when images deteriorate, as well as useful standards for tool development in the future.
LGMar 31, 2021
Collaborative Label Correction via Entropy ThresholdingHao Wu, Jiangchao Yao, Jiajie Wang et al.
Deep neural networks (DNNs) have the capacity to fit extremely noisy labels nonetheless they tend to learn data with clean labels first and then memorize those with noisy labels. We examine this behavior in light of the Shannon entropy of the predictions and demonstrate the low entropy predictions determined by a given threshold are much more reliable as the supervision than the original noisy labels. It also shows the advantage in maintaining more training samples than previous methods. Then, we power this entropy criterion with the Collaborative Label Correction (CLC) framework to further avoid undesired local minimums of the single network. A range of experiments have been conducted on multiple benchmarks with both synthetic and real-world settings. Extensive results indicate that our CLC outperforms several state-of-the-art methods.
CVFeb 10, 2018
Collaborative Learning for Weakly Supervised Object DetectionJiajie Wang, Jiangchao Yao, Ya Zhang et al.
Weakly supervised object detection has recently received much attention, since it only requires image-level labels instead of the bounding-box labels consumed in strongly supervised learning. Nevertheless, the save in labeling expense is usually at the cost of model accuracy. In this paper, we propose a simple but effective weakly supervised collaborative learning framework to resolve this problem, which trains a weakly supervised learner and a strongly supervised learner jointly by enforcing partial feature sharing and prediction consistency. For object detection, taking WSDDN-like architecture as weakly supervised detector sub-network and Faster-RCNN-like architecture as strongly supervised detector sub-network, we propose an end-to-end Weakly Supervised Collaborative Detection Network. As there is no strong supervision available to train the Faster-RCNN-like sub-network, a new prediction consistency loss is defined to enforce consistency of predictions between the two sub-networks as well as within the Faster-RCNN-like sub-networks. At the same time, the two detectors are designed to partially share features to further guarantee the model consistency at perceptual level. Extensive experiments on PASCAL VOC 2007 and 2012 data sets have demonstrated the effectiveness of the proposed framework.
CVNov 2, 2017
Deep Learning from Noisy Image Labels with Quality EmbeddingJiangchao Yao, Jiajie Wang, Ivor Tsang et al.
There is an emerging trend to leverage noisy image datasets in many visual recognition tasks. However, the label noise among the datasets severely degenerates the \mbox{performance of deep} learning approaches. Recently, one mainstream is to introduce the latent label to handle label noise, which has shown promising improvement in the network designs. Nevertheless, the mismatch between latent labels and noisy labels still affects the predictions in such methods. To address this issue, we propose a quality embedding model, which explicitly introduces a quality variable to represent the trustworthiness of noisy labels. Our key idea is to identify the mismatch between the latent and noisy labels by embedding the quality variables into different subspaces, which effectively minimizes the noise effect. At the same time, the high-quality labels is still able to be applied for training. To instantiate the model, we further propose a Contrastive-Additive Noise network (CAN), which consists of two important layers: (1) the contrastive layer estimates the quality variable in the embedding space to reduce noise effect; and (2) the additive layer aggregates the prior predictions and noisy labels as the posterior to train the classifier. Moreover, to tackle the optimization difficulty, we deduce an SGD algorithm with the reparameterization tricks, which makes our method scalable to big data. We conduct the experimental evaluation of the proposed method over a range of noisy image datasets. Comprehensive results have demonstrated CAN outperforms the state-of-the-art deep learning approaches.