LGSep 1, 2024Code
Hyper-Compression: Model Compression via HyperfunctionFenglei Fan, Juntong Fan, Dayang Wang et al.
The rapid growth of large models' size has far outpaced that of computing resources. To bridge this gap, encouraged by the parsimonious relationship between genotype and phenotype in the brain's growth and development, we propose the so-called Hyper-Compression that turns the model compression into the issue of parameter representation via a hyperfunction. Specifically, it is known that the trajectory of some low-dimensional dynamic systems can fill the high-dimensional space eventually. Thus, Hyper-Compression, using these dynamic systems as the hyperfunctions, represents the parameters of the target network by their corresponding composition number or trajectory length. This suggests a novel mechanism for model compression, substantially different from the existing pruning, quantization, distillation, and decomposition. Along this direction, we methodologically identify a suitable dynamic system with the irrational winding as the hyperfunction and theoretically derive its associated error bound. Next, guided by our theoretical insights, we propose several engineering twists to make the Hyper-Compression pragmatic and effective. Lastly, systematic and comprehensive experiments on \textcolor{black}{NLP models such as LLaMA and Qwen series and vision models} confirm that Hyper-Compression enjoys the following \textbf{PNAS} merits: 1) \textbf{P}referable compression ratio; 2) \textbf{N}o post-hoc retraining; 3) \textbf{A}ffordable inference time; and 4) \textbf{S}hort compression time. It compresses LLaMA2-7B in an hour and achieves close-to-int4-quantization performance, without retraining and with a performance drop of less than 1\%. We have open-sourced our code in https://github.com/Juntongkuki/Hyper-Compression.git for free download and evaluation.
CVAug 9, 2025
Large Language Model Evaluated Stand-alone Attention-Assisted Graph Neural Network with Spatial and Structural Information Interaction for Precise Endoscopic Image SegmentationJuntong Fan, Shuyi Fan, Debesh Jha et al.
Accurate endoscopic image segmentation on the polyps is critical for early colorectal cancer detection. However, this task remains challenging due to low contrast with surrounding mucosa, specular highlights, and indistinct boundaries. To address these challenges, we propose FOCUS-Med, which stands for Fusion of spatial and structural graph with attentional context-aware polyp segmentation in endoscopic medical imaging. FOCUS-Med integrates a Dual Graph Convolutional Network (Dual-GCN) module to capture contextual spatial and topological structural dependencies. This graph-based representation enables the model to better distinguish polyps from background tissues by leveraging topological cues and spatial connectivity, which are often obscured in raw image intensities. It enhances the model's ability to preserve boundaries and delineate complex shapes typical of polyps. In addition, a location-fused stand-alone self-attention is employed to strengthen global context integration. To bridge the semantic gap between encoder-decoder layers, we incorporate a trainable weighted fast normalized fusion strategy for efficient multi-scale aggregation. Notably, we are the first to introduce the use of a Large Language Model (LLM) to provide detailed qualitative evaluations of segmentation quality. Extensive experiments on public benchmarks demonstrate that FOCUS-Med achieves state-of-the-art performance across five key metrics, underscoring its effectiveness and clinical potential for AI-assisted colonoscopy.
CVJul 11, 2025
Compress Any Segment Anything Model (SAM)Juntong Fan, Zhiwei Hao, Jianqiang Shen et al.
Due to the excellent performance in yielding high-quality, zero-shot segmentation, Segment Anything Model (SAM) and its variants have been widely applied in diverse scenarios such as healthcare and intelligent manufacturing. Therefore, effectively compressing SAMs has become an increasingly pressing practical need. In this study, we propose Birkhoff, a novel data-free compression algorithm for SAM and its variants. Unlike quantization, pruning, distillation, and other compression methods, Birkhoff embodies versatility across model types, agility in deployment, faithfulness to the original model, and compactness in model size. Specifically, Birkhoff introduces a novel compression algorithm: Hyper-Compression, whose core principle is to find a dense trajectory to turn a high-dimensional parameter vector into a low-dimensional scalar. Furthermore, Birkhoff designs a dedicated linear layer operator, HyperLinear, to fuse decompression and matrix multiplication to significantly accelerate inference of the compressed SAMs. Extensive experiments on 18 SAMs in the COCO, LVIS, and SA-1B datasets show that Birkhoff performs consistently and competitively in compression time, compression ratio, post-compression performance, and inference speed. For example, Birkhoff can achieve a compression ratio of 5.17x on SAM2-B, with less than 1% performance drop without using any fine-tuning data. Moreover, the compression is finished within 60 seconds for all models.