CLApr 24, 2025Code
An Empirical Study on Prompt Compression for Large Language ModelsZheng Zhang, Jinyi Li, Yihuai Lan et al.
Prompt engineering enables Large Language Models (LLMs) to perform a variety of tasks. However, lengthy prompts significantly increase computational complexity and economic costs. To address this issue, we study six prompt compression methods for LLMs, aiming to reduce prompt length while maintaining LLM response quality. In this paper, we present a comprehensive analysis covering aspects such as generation performance, model hallucinations, efficacy in multimodal tasks, word omission analysis, and more. We evaluate these methods across 13 datasets, including news, scientific articles, commonsense QA, math QA, long-context QA, and VQA datasets. Our experiments reveal that prompt compression has a greater impact on LLM performance in long contexts compared to short ones. In the Longbench evaluation, moderate compression even enhances LLM performance. Our code and data is available at https://github.com/3DAgentWorld/Toolkit-for-Prompt-Compression.
NAMar 27
Adaptive low-rank exponential integrators for large-scale differential Riccati equationJinyi Li, Dongping Li, Hua Yang
Matrix differential Riccati equation (DRE) typically exhibits transient and steady-state phases, posing challenges for fixed-step time integration methods, which may lack accuracy during transients or oversample in steady regimes. In this work, we propose adaptive low-rank matrix-valued exponential integrators for large-scale stiff DRE. The methods combine embedded exponential Rosenbrock-type schemes and adaptive step-size control, enabling an automatic adjustment to the evolving solution dynamics. This improves the accuracy during rapid transient phases while maintaining high accuracy in the steady state. Numerical experiments on benchmark problems demonstrate that the proposed adaptive integrators consistently improve accuracy and computational efficiency compared with fixed-step low-rank schemes.
CVMay 7
NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down MapsDijia Zhan, Jinyi Li, Chenxi Zheng et al.
Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, they often rely on incrementally updating memory graphs or scoring discrete path proposals, which restricts continuous spatial reasoning and creates discrete bottlenecks. We propose Top-Down VLN (TD-VLN), reformulating navigation as a one-step global path planning problem on pre-built top-down maps, supported by our newly constructed R2R-TopDown dataset. To solve this, we introduce NavOne, a unified framework that directly predicts dense path probabilities over multi-modal maps in a single end-to-end forward pass. NavOne features a Top-Down Map Fuser for joint multi-modal map representation, and extends Attention Residuals for spatial-aware depth mixing. Extensive experiments on R2R-TopDown show that NavOne achieves state-of-the-art performance among map-based VLN methods, with a planning-stage speedup of 8x over existing map-based baselines and 80x over egocentric methods, enabling highly efficient global navigation.
CVSep 18, 2025
Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric RectificationTuo Xiang, Xuemiao Xu, Bangzhen Liu et al.
The rapid growth of 3D digital content necessitates expandable recognition systems for open-world scenarios. However, existing 3D class-incremental learning methods struggle under extreme data scarcity due to geometric misalignment and texture bias. While recent approaches integrate 3D data with 2D foundation models (e.g., CLIP), they suffer from semantic blurring caused by texture-biased projections and indiscriminate fusion of geometric-textural cues, leading to unstable decision prototypes and catastrophic forgetting. To address these issues, we propose Cross-Modal Geometric Rectification (CMGR), a framework that enhances 3D geometric fidelity by leveraging CLIP's hierarchical spatial semantics. Specifically, we introduce a Structure-Aware Geometric Rectification module that hierarchically aligns 3D part structures with CLIP's intermediate spatial priors through attention-driven geometric fusion. Additionally, a Texture Amplification Module synthesizes minimal yet discriminative textures to suppress noise and reinforce cross-modal consistency. To further stabilize incremental prototypes, we employ a Base-Novel Discriminator that isolates geometric variations. Extensive experiments demonstrate that our method significantly improves 3D few-shot class-incremental learning, achieving superior geometric coherence and robustness to texture bias across cross-domain and within-domain settings.
CLMar 26, 2024
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language ModelsJinyi Li, Yihuai Lan, Lei Wang et al.
Prompt compression is an innovative method for efficiently condensing input prompts while preserving essential information. To facilitate quick-start services, user-friendly interfaces, and compatibility with common datasets and metrics, we present the Prompt Compression Toolkit (PCToolkit). This toolkit is a unified plug-and-play solution for compressing prompts in Large Language Models (LLMs), featuring cutting-edge prompt compressors, diverse datasets, and metrics for comprehensive performance evaluation. PCToolkit boasts a modular design, allowing for easy integration of new datasets and metrics through portable and user-friendly interfaces. In this paper, we outline the key components and functionalities of PCToolkit. We conducted evaluations of the compressors within PCToolkit across various natural language tasks, including reconstruction, summarization, mathematical problem-solving, question answering, few-shot learning, synthetic tasks, code completion, boolean expressions, multiple choice questions, and lies recognition.