LGMar 8

One-for-All Model Initialization with Frequency-Domain Knowledge

arXiv:2603.07523v1
Predicted impact top 37% in LG · last 90 daysOriginality Highly original
AI Analysis

This work provides a novel method for more flexible and efficient knowledge transfer from pre-trained models to new architectures, which is significant for researchers and practitioners working with large models and diverse downstream tasks. It offers an incremental improvement over existing parameter transfer methods.

This paper addresses the challenge of reusing knowledge from large pre-trained models across different model architectures by demonstrating that foundational, task-agnostic knowledge is encoded in the low-frequency components of model weights. They propose FRONT, a framework that uses Discrete Cosine Transform to extract this "learngene" for training-free initialization of models of arbitrary size, achieving up to 15x faster convergence in vision tasks and 40.5% FLOPs reduction in language tasks.

Transferring knowledge by fine-tuning large-scale pre-trained networks has become a standard paradigm for downstream tasks, yet the knowledge of a pre-trained model is tightly coupled with monolithic architecture, which restricts flexible reuse across models of varying scales. In response to this challenge, recent approaches typically resort to either parameter selection, which fails to capture the interdependent structure of this knowledge, or parameter prediction using generative models that depend on impractical access to large network collections. In this paper, we empirically demonstrate that a model's foundational, task-agnostic knowledge, its "learngene", is encoded within the low-frequency components of its weights, and can be efficiently inherited by downstream models. Based on this insight, we propose FRONT (FRequency dOmain kNowledge Transfer), a novel framework that uses the Discrete Cosine Transform (DCT) to isolate the low-frequency "learngene". This learngene can be seamlessly adapted to initialize models of arbitrary size via simple truncation or padding, a process that is entirely training-free. For enhanced performance, we propose an optional low-cost refinement process that introduces a spectral regularizer to further improve the learngene's transferability. Extensive experiments demonstrate that FRONT achieves the state-of-the-art performance, accelerates convergence by up to 15 times in vision tasks, and reduces training FLOPs by an average of 40.5% in language tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes