Liwei Guo

h-index13

7papers

134citations

Novelty55%

AI Score48

Ranked #53,104 of 205,806 authors (top 26%)#1,090 in CR (top 15%)

7 Papers

85.6ETMay 29

GaMi: Geometry-Agnostic Material Identification via Cross-Modal Subtractive Disentanglement

Zhiwei Chen, Yijie Li, Yimo Zhang et al.

Non-contact material identification enables adaptive interaction for embodied intelligence yet faces challenges from geometry-induced variations (e.g., orientation, shape, distance) and single-modality ambiguities. In this paper, we present GaMi, a multimodal material identification system integrating mmWave and acoustic sensing to robustly operate under unconstrained geometric conditions. By leveraging the insight of shared geometric consistency between co-located bimodal sensors, GaMi employs an intra-sample cross-modal subtractive disentanglement framework. By semantically aligning modalities and subtracting the shared geometric context, it isolates intrinsic material features. Furthermore, GaMi incorporates inter-sample contrastive learning to correct the residual interference caused by cross-modal misalignment. Additionally, a pairing-based adaptation strategy between two modalities enables few-shot generalization across devices. Extensive evaluations on 20 materials show that GaMi achieves 95.2% accuracy, outperforming single-modality baselines across unseen geometric conditions.

LGAug 28, 2023Code

EdgeMoE: Empowering Sparse Large Language Models on Mobile Devices

Rongjie Yi, Liwei Guo, Shiyun Wei et al.

Large language models (LLMs) such as GPTs and Mixtral-8x7B have revolutionized machine intelligence due to their exceptional abilities in generic ML tasks. Transiting LLMs from datacenters to edge devices brings benefits like better privacy and availability, but is challenged by their massive parameter size and thus unbearable runtime costs. To this end, we present EdgeMoE, an on-device inference engine for mixture-of-expert (MoE) LLMs -- a popular form of sparse LLM that scales its parameter size with almost constant computing complexity. EdgeMoE achieves both memory- and compute-efficiency by partitioning the model into the storage hierarchy: non-expert weights are held in device memory; while expert weights are held on external storage and fetched to memory only when activated. This design is motivated by a key observation that expert weights are bulky but infrequently used due to sparse activation. To further reduce the expert I/O swapping overhead, EdgeMoE incorporates two novel techniques: (1) expert-wise bitwidth adaptation that reduces the expert sizes with tolerable accuracy loss; (2) expert preloading that predicts the activated experts ahead of time and preloads it with the compute-I/O pipeline. On popular MoE LLMs and edge devices, EdgeMoE showcase significant memory savings and speedup over competitive baselines. The code is available at https://github.com/UbiquitousLearning/mllm.

42.2CRJun 1

PyFEX: Uncovering Evasive Python-based Threats via Resilient and Exhaustive Path Exploration

Meng Wang, Yue Ma, Majid Garoosi et al.

The rapid expansion of the Python ecosystem has fueled two distinct but converging threats: adversaries increasingly target the software supply chain via the Python Package Index (PyPI), while also building evasive, cross-platform malicious binaries compiled from source code written in Python. Current program analysis techniques struggle to address this dual threat. Static analysis based tools are often blinded by runtime obfuscation and compiled bytecode, while dynamic analysis based ones are fragile, prone to evasion by environmental guardrails, and often terminates prematurely due to unsatisfied dependencies. To overcome these limitations, we present PyFEX, a resilient forced-execution engine. PyFEX explores a program's behavioral space systematically by forcing execution across all conditional branches to bypass evasion checks. To address the fragility of dynamic execution, it introduces a novel resilient crash recovery mechanism that synthesizes dummy objects to satisfy failed operations at the runtime, allowing analysis to proceed past fatal errors, and employs path merging to mitigate path explosion. PyFEX further incorporates an automated entry identification mechanism that proactively discovers and invokes dormant functions, exposing malicious logic hidden within uncalled APIs. To demonstrate the efficacy of this engine, we built PyFEXScan, a proof-of-concept malware detector built on top of PyFEX. Evaluated against both known malicious PyPI packages and real-world compiled binaries, PyFEX exposes critical behaviors missed by the existing state-of-the-art tools. In a live deployment on PyPI, PyFEXScan discovered 212 previously unknown malicious packages accounting for over 91,648 downloads, underscoring the necessity of resilient, exhaustive analysis for securing the Python ecosystem.

LGJul 11, 2022

STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining

Liwei Guo, Wonkyo Choe, Felix Xiaozhu Lin

Natural Language Processing (NLP) inference is seeing increasing adoption by mobile applications, where on-device inference is desirable for crucially preserving user data privacy and avoiding network roundtrips. Yet, the unprecedented size of an NLP model stresses both latency and memory, creating a tension between the two key resources of a mobile device. To meet a target latency, holding the whole model in memory launches execution as soon as possible but increases one app's memory footprints by several times, limiting its benefits to only a few inferences before being recycled by mobile memory management. On the other hand, loading the model from storage on demand incurs IO as long as a few seconds, far exceeding the delay range satisfying to a user; pipelining layerwise model loading and execution does not hide IO either, due to the high skewness between IO and computation delays. To this end, we propose Speedy Transformer Inference (STI). Built on the key idea of maximizing IO/compute resource utilization on the most important parts of a model, STI reconciles the latency v.s. memory tension via two novel techniques. First, model sharding. STI manages model parameters as independently tunable shards, and profiles their importance to accuracy. Second, elastic pipeline planning with a preload buffer. STI instantiates an IO/compute pipeline and uses a small buffer for preload shards to bootstrap execution without stalling at early stages; it judiciously selects, tunes, and assembles shards per their importance for resource-elastic execution, maximizing inference accuracy. Atop two commodity SoCs, we build STI and evaluate it against a wide range of NLP tasks, under a practical range of target latencies, and on both CPU and GPU. We demonstrate that STI delivers high accuracies with 1-2 orders of magnitude lower memory, outperforming competitive baselines.

DBMay 18, 2024

The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving

Pai Zeng, Zhenyu Ning, Jieru Zhao et al.

We survey the large language model (LLM) serving area to understand the intricate dynamics between cost-efficiency and accuracy, which is magnified by the growing need for longer contextual understanding when deploying models at a massive scale. Our findings reveal that works in this space optimize along three distinct but conflicting goals: improving serving context length (C), improving serving accuracy (A), and improving serving performance (P). Drawing inspiration from the CAP theorem in databases, we propose a CAP principle for LLM serving, which suggests that any optimization can improve at most two of these three goals simultaneously. Our survey categorizes existing works within this framework. We find the definition and continuity of user-perceived measurement metrics are crucial in determining whether a goal has been met, akin to prior CAP databases in the wild. We recognize the CAP principle for LLM serving as a guiding principle, rather than a formal theorem, to inform designers of the inherent and dynamic trade-offs in serving models. As serving accuracy and performance have been extensively studied, this survey focuses on works that extend serving context length and address the resulting challenges.

OSOct 15, 2021

Minimum Viable Device Drivers for ARM TrustZone

Liwei Guo, Felix Xiaozhu Lin

While TrustZone can isolate IO hardware, it lacks drivers for modern IO devices. Rather than porting drivers, we propose a novel approach to deriving minimum viable drivers: developers exercise a full driver and record the driver/device interactions; the processed recordings, dubbed driverlets, are replayed in the TEE at run time to access IO devices. Driverlets address two key challenges: correctness and expressiveness, for which they build on a key construct called interaction template. The interaction template ensures faithful reproduction of recorded IO jobs (albeit on new IO data); it accepts dynamic input values; it tolerates nondeterministic device behaviors. We demonstrate driverlets on a series of sophisticated devices, making them accessible to TrustZone for the first time to our knowledge. Our experiments show that driverlets are secure, easy to build, and incur acceptable overhead (1.4x -2.7x compared to native drivers). Driverlets fill a critical gap in the TrustZone TEE, realizing its long-promised vision of secure IO.

CRFeb 17, 2019

Let the Cloud Watch Over Your IoT File Systems

Liwei Guo, Yiying Zhang, Felix Xiaozhu Lin

Smart devices produce security-sensitive data and keep them in on-device storage for persistence. The current storage stack on smart devices, however, offers weak security guarantees: not only because the stack depends on a vulnerable commodity OS, but also because smart device deployment is known weak on security measures. To safeguard such data on smart devices, we present a novel storage stack architecture that i) protects file data in a trusted execution environment (TEE); ii) outsources file system logic and metadata out of TEE; iii) running a metadata-only file system replica in the cloud for continuously verifying the on-device file system behaviors. To realize the architecture, we build Overwatch, aTrustZone-based storage stack. Overwatch addresses unique challenges including discerning metadata at fine grains, hiding network delays, and coping with cloud disconnection. On a suite of three real-world applications, Overwatch shows moderate security overheads.