Xiang Liu

h-index11

4papers

45citations

Novelty44%

AI Score34

Ranked #115,175 of 194,257 authors (top 59%)#25,327 in LG (top 63%)

4 Papers

23.9LGMay 26, 2025Code

Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

Peijie Dong, Zhenheng Tang, Xiang Liu et al.

Post-training compression reduces the computational and memory costs of large language models (LLMs), enabling resource-efficient deployment. However, existing compression benchmarks only focus on language modeling (e.g., perplexity) and natural language understanding tasks (e.g., GLUE accuracy), ignoring the agentic capabilities - workflow, tool use/function call, long-context understanding and real-world application. We introduce the Agent Compression Benchmark (ACBench), the first comprehensive benchmark for evaluating how compression impacts LLMs' agentic abilities. ACBench spans (1) 12 tasks across 4 capabilities (e.g., WorfBench for workflow generation, Needle-in-Haystack for long-context retrieval), (2) quantization (GPTQ, AWQ) and pruning (Wanda, SparseGPT), and (3) 15 models, including small (Gemma-2B), standard (Qwen2.5 7B-32B), and distilled reasoning LLMs (DeepSeek-R1-Distill). Our experiments reveal compression tradeoffs: 4-bit quantization preserves workflow generation and tool use (1%-3% drop) but degrades real-world application accuracy by 10%-15%. We introduce ERank, Top-k Ranking Correlation and Energy to systematize analysis. ACBench provides actionable insights for optimizing LLM compression in agentic scenarios. The code can be found in https://github.com/pprp/ACBench.

22.0LGFeb 24, 2025

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Zhenheng Tang, Xiang Liu, Qian Wang et al.

Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers. However, current methods predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy on tasks of common sense knowledge QA and basic arithmetic reasoning. In this blog, we present a brief review of recent advancements in LLMs related to retrieval-augmented generation, multi-step reasoning, external tools, and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance as the original LLM with the assistance of multi-step reasoning and external tools. Based on the review of current progress in LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.

6.3IVJan 23, 2024

An Efficient Implicit Neural Representation Image Codec Based on Mixed Autoregressive Model for Low-Complexity Decoding

Xiang Liu, Jiahong Chen, Bin Chen et al.

Displaying high-quality images on edge devices, such as augmented reality devices, is essential for enhancing the user experience. However, these devices often face power consumption and computing resource limitations, making it challenging to apply many deep learning-based image compression algorithms in this field. Implicit Neural Representation (INR) for image compression is an emerging technology that offers two key benefits compared to cutting-edge autoencoder models: low computational complexity and parameter-free decoding. It also outperforms many traditional and early neural compression methods in terms of quality. In this study, we introduce a new Mixed AutoRegressive Model (MARM) to significantly reduce the decoding time for the current INR codec, along with a new synthesis network to enhance reconstruction quality. MARM includes our proposed AutoRegressive Upsampler (ARU) blocks, which are highly computationally efficient, and ARM from previous work to balance decoding time and reconstruction quality. We also propose enhancing ARU's performance using a checkerboard two-stage decoding strategy. Moreover, the ratio of different modules can be adjusted to maintain a balance between quality and speed. Comprehensive experiments demonstrate that our method significantly improves computational efficiency while preserving image quality. With different parameter settings, our method can achieve over a magnitude acceleration in decoding time without industrial level optimization, or achieve state-of-the-art reconstruction quality compared with other INR codecs. To the best of our knowledge, our method is the first INR-based codec comparable with Hyperprior in both decoding speed and quality while maintaining low complexity.

3.5HCJun 12, 2016

SenseFlow: An Experimental Study for Tracking People

Kai Li, Chau Yuen, Salil S. Kanhere et al.

The main challenges in large-scale people tracking are the recognition of people density in a specific area and tracking the people flow path. To address these challenges, we present SenseFlow, a lightweight people tracking system. SenseFlow utilises off-the-shelf devices which sniff probe requests periodically polled by user's smartphones in a passive manner. We demonstrate the feasibility of SenseFlow by building a proof-of-concept prototype and undertaking extensive evaluations in real-world settings. We deploy the system in one laboratory to study office hours of researchers, a crowded public area in city to evaluate the scalability and performance "in the wild", and four classrooms in the university to monitor the number of students. We also evaluate SenseFlow with varying walking speeds and different models of smartphones to investigate the people flow tracking performance.