78.7NAJun 2
Three-term recurrence iterations for energy-based modelsR. Altmann, J. Ramme, P. Schulze
It is well-known that the midpoint rule preserves the dissipation inequality if applied to a certain class of energy-based models. We introduce an appropriate scaling of the state variables such that the symmetric part of the resulting iteration matrix is guaranteed to be positive definite. This allows the application of three-term iteration schemes such as the methods of Widlund and Rapoport. Special emphasis is put on examples where the symmetric part is block diagonal such that the computations decouple. This then leads to efficient dissipation-preserving numerical schemes as illustrated in two numerical examples, namely the biharmonic heat equation and linear poroelasticity.
CLMay 11, 2021
Benchmarking down-scaled (not so large) pre-trained language modelsM. Aßenmacher, P. Schulze, C. Heumann
Large Transformer-based language models are pre-trained on corpora of varying sizes, for a different number of steps and with different batch sizes. At the same time, more fundamental components, such as the pre-training objective or architectural hyperparameters, are modified. In total, it is therefore difficult to ascribe changes in performance to specific factors. Since searching the hyperparameter space over the full systems is too costly, we pre-train down-scaled versions of several popular Transformer-based architectures on a common pre-training corpus and benchmark them on a subset of the GLUE tasks (Wang et al., 2018). Specifically, we systematically compare three pre-training objectives for different shape parameters and model sizes, while also varying the number of pre-training steps and the batch size. In our experiments MLM + NSP (BERT-style) consistently outperforms MLM (RoBERTa-style) as well as the standard LM objective. Furthermore, we find that additional compute should be mainly allocated to an increased model size, while training for more steps is inefficient. Based on these observations, as a final step we attempt to scale up several systems using compound scaling (Tan and Le, 2019) adapted to Transformer-based language models.
CLApr 6, 2021
A Bayesian approach to modeling topic-metadata relationshipsP. Schulze, S. Wiegrebe, P. W. Thurner et al.
The objective of advanced topic modeling is not only to explore latent topical structures, but also to estimate relationships between the discovered topics and theoretically relevant metadata. Methods used to estimate such relationships must take into account that the topical structure is not directly observed, but instead being estimated itself in an unsupervised fashion, usually by common topic models. A frequently used procedure to achieve this is the method of composition, a Monte Carlo sampling technique performing multiple repeated linear regressions of sampled topic proportions on metadata covariates. In this paper, we propose two modifications of this approach: First, we substantially refine the existing implementation of the method of composition from the R package stm by replacing linear regression with the more appropriate Beta regression. Second, we provide a fundamental enhancement of the entire estimation framework by substituting the current blending of frequentist and Bayesian methods with a fully Bayesian approach. This allows for a more appropriate quantification of uncertainty. We illustrate our improved methodology by investigating relationships between Twitter posts by German parliamentarians and different metadata covariates related to their electoral districts, using the Structural Topic Model to estimate topic proportions.