LGMar 28, 2025

Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning

Deshani Geethika Poddenige, Sachith Seneviratne, Damith Senanayake, Mahesan Niranjan, PN Suganthan, Saman Halgamuge

arXiv:2503.22063v14.1h-index: 15

Originality Incremental advance

AI Analysis

This addresses a bottleneck in NAS for researchers and practitioners by reducing inefficiencies in architecture generation, though it is incremental as it builds on existing VAE and LLM methods.

The paper tackles the problem of generating invalid or duplicate neural architectures in Neural Architecture Search (NAS) by introducing a Vector Quantized Variational Autoencoder (VQ-VAE) to learn discrete latent representations, combined with a Large Language Model for sequence generation, resulting in improvements of over 80% in valid and unique architectures on NAS-Bench-101 and over 8% on NAS-Bench-201.

Unsupervised representation learning has been widely explored across various modalities, including neural architectures, where it plays a key role in downstream applications like Neural Architecture Search (NAS). These methods typically learn an unsupervised representation space before generating/ sampling architectures for the downstream search. A common approach involves the use of Variational Autoencoders (VAEs) to map discrete architectures onto a continuous representation space, however, sampling from these spaces often leads to a high percentage of invalid or duplicate neural architectures. This could be due to the unnatural mapping of inherently discrete architectural space onto a continuous space, which emphasizes the need for a robust discrete representation of these architectures. To address this, we introduce a Vector Quantized Variational Autoencoder (VQ-VAE) to learn a discrete latent space more naturally aligned with the discrete neural architectures. In contrast to VAEs, VQ-VAEs (i) map each architecture into a discrete code sequence and (ii) allow the prior to be learned by any generative model rather than assuming a normal distribution. We then represent these architecture latent codes as numerical sequences and train a text-to-text model leveraging a Large Language Model to learn and generate sequences representing architectures. We experiment our method with Inception/ ResNet-like cell-based search spaces, namely NAS-Bench-101 and NAS-Bench-201. Compared to VAE-based methods, our approach improves the generation of valid and unique architectures by over 80% on NASBench-101 and over 8% on NASBench-201. Finally, we demonstrate the applicability of our method in NAS employing a sequence-modeling-based NAS algorithm.

View on arXiv PDF

Similar