LGJun 9, 2025

SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space

Zitong Huang, Mansooreh Montazerin, Ajitesh Srivastava

arXiv:2506.08270v34.1h-index: 4

Originality Highly original

AI Analysis

This addresses the time-consuming and labor-intensive nature of manual design or discretized neural architecture search for researchers and practitioners in machine learning, representing a novel paradigm rather than an incremental improvement.

The paper tackles the problem of designing neural networks by proposing a method that simultaneously optimizes architecture and weights in a continuous latent space, resulting in the discovery of sparse and compact networks with strong performance on synthetic regression tasks.

Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.

View on arXiv PDF

Similar