LGCLMLMay 6, 2020

Learning Architectures from an Extended Search Space for Language Modeling

arXiv:2005.02593v21000 citations
AI Analysis

This work addresses the bottleneck of restricted search spaces in NAS for researchers and practitioners, offering incremental improvements by enabling more comprehensive architecture learning.

The paper tackles the problem of limited search spaces in neural architecture search (NAS) by extending it to learn both intra-cell and inter-cell architectures, achieving a new state-of-the-art on PTB for language modeling and showing transferability to tasks like named entity recognition and chunking.

Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell. In this paper, we extend the search space of NAS. In particular, we present a general approach to learn both intra-cell and inter-cell architectures (call it ESS). For a better search result, we design a joint learning method to perform intra-cell and inter-cell NAS simultaneously. We implement our model in a differentiable architecture search system. For recurrent neural language modeling, it outperforms a strong baseline significantly on the PTB and WikiText data, with a new state-of-the-art on PTB. Moreover, the learned architectures show good transferability to other systems. E.g., they improve state-of-the-art systems on the CoNLL and WNUT named entity recognition (NER) tasks and CoNLL chunking task, indicating a promising line of research on large-scale pre-learned architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes