Axial Neural Networks for Dimension-Free Foundation Models
This addresses inefficiencies in training foundation models for physics problems like PDEs, offering a dimension-free approach that could enhance scalability and adaptability, though it is incremental as it builds on existing parameter-sharing structures.
The paper tackles the challenge of training foundation models on physics data with varying dimensionalities by proposing Axial Neural Networks (XNNs), a dimension-agnostic architecture that generalizes across tensor dimensions. Experiments show XNNs perform competitively with original models and achieve superior generalization to unseen dimensions, highlighting the benefits of multidimensional pretraining.
The advent of foundation models in AI has significantly advanced general-purpose learning, enabling remarkable capabilities in zero-shot inference and in-context learning. However, training such models on physics data, including solutions to partial differential equations (PDEs), poses a unique challenge due to varying dimensionalities across different systems. Traditional approaches either fix a maximum dimension or employ separate encoders for different dimensionalities, resulting in inefficiencies. To address this, we propose a dimension-agnostic neural network architecture, the Axial Neural Network (XNN), inspired by parameter-sharing structures such as Deep Sets and Graph Neural Networks. XNN generalizes across varying tensor dimensions while maintaining computational efficiency. We convert existing PDE foundation models into axial neural networks and evaluate their performance across three training scenarios: training from scratch, pretraining on multiple PDEs, and fine-tuning on a single PDE. Our experiments show that XNNs perform competitively with original models and exhibit superior generalization to unseen dimensions, highlighting the importance of multidimensional pretraining for foundation models.