Dezhou Shen

3papers

4citations

Novelty40%

AI Score19

Ranked #197,785 of 205,806 authors (top 96%)#31,685 in CL (top 98%)

3 Papers

CLApr 9, 2022

FoundationLayerNorm: Scaling BERT and GPT to 1,000 Layers

Dezhou Shen

The mainstream BERT/GPT model contains only 10 to 20 layers, and there is little literature to discuss the training of deep BERT/GPT. This paper proposes a simple yet effective method to stabilize BERT and GPT training. We successfully scale up BERT and GPT to 1,000 layers, which is an order of magnitude deeper than previous BERT and GPT. The proposed method FoundationLayerNormalization enables efficient training of deep neural networks and is validated at the 1000-layer scale.

CLNov 9, 2021

FPM: A Collection of Large-scale Foundation Pre-trained Language Models

Dezhou Shen

Large-scale Transformer models have significantly promoted the recent development of natural language processing applications. However, little effort has been made to unify the effective models. In this paper, driven by providing a new set of baseline models in the future, we adopt various novel transformer architectures and launch a model set with the help of recent mainstream technologies. We focus the discussions on optimizing the depth of the networks based on the existing powerful encode-decoder structures. We show that by properly avoiding training defects such as non-convergence and degradation, scaling up off-the-shelf transformer architectures consistently delivers better performance. To stimulate future research on large-scale language model pretraining, we present extensive results and detailed discussions on network performance improvements with respect to the network depth and confirm the existence of the optimal number of layers under specific tasks. To the best of our knowledge, we provide the largest Chinese generative model and the largest Chinese encoding model. The BERT language models we trained on English datasets deliver a 14.45% higher F1 score than the Turing-NLR.

SIJun 24, 2020

Movie Box office Prediction via Joint Actor Representations and Social Media Sentiment

Dezhou Shen

In recent years, driven by the Asian film industry, such as China and India, the global box office has maintained a steady growth trend. Previous studies have rarely used long-term, full-sample film data in analysis, lack of research on actors' social networks. Existing film box office prediction algorithms only use film meta-data, lack of using social network characteristics and the model is less interpretable. I propose a FC-GRU-CNN binary classification model in of box office prediction task, combining five characteristics, including the film meta-data, Sina Weibo text sentiment, actors' social network measurement, all pairs shortest path and actors' art contribution. Exploiting long-term memory ability of GRU layer in long sequences and the mapping ability of CNN layer in retrieving all pairs shortest path matrix features, proposed model is 14% higher in accuracy than the current best C-LSTM model.