Rethinking Parameter Sharing as Graph Coloring for Structured Compression
This addresses the critical bottleneck of computationally infeasible exhaustive search for cross-layer parameter sharing in structured compression, benefiting deployment of deep models.
The paper tackled the problem of high inference-time memory usage in deep models by proposing a principled method for parameter sharing across layers, recasting it as a graph coloring problem with a geometric criterion based on Hessian spectrum. It achieved higher compression ratios with smaller accuracy degradation compared to state-of-the-art heuristic strategies.
Modern deep models have massive parameter sizes, leading to high inference-time memory usage that limits practical deployment. Parameter sharing, a form of structured compression, effectively reduces redundancy, but existing approaches remain heuristic-restricted to adjacent layers and lacking a systematic analysis for cross-layer sharing. However, extending sharing across multiple layers leads to an exponentially expanding configuration space, making exhaustive search computationally infeasible and forming a critical bottleneck for parameter sharing. We recast parameter sharing from a group-theoretic perspective as introducing structural symmetries in the model's parameter space. A sharing configuration can be described by a coloring function $α:L\rightarrow C$ (L: layer indices and C: sharing classes), which determines inter-layer sharing groups while preserving structural symmetry. To determine the coloring function, we propose a second-order geometric criterion based on Taylor expansion and the Hessian spectrum. By projecting perturbations onto the Hessian's low-curvature eigensubspace, the criterion provides an analytic rule for selecting sharing groups that minimize performance impact, yielding a principled and scalable configuration procedure. Across diverse architectures and tasks, Geo-Sharing consistently outperforms state-of-the-art heuristic sharing strategies, achieving higher compression ratios with smaller accuracy degradation.