Leon Strømberg-Derczynski

2papers

2 Papers

LGMar 17, 2021
Set-to-Sequence Methods in Machine Learning: a Review

Mateusz Jurewicz, Leon Strømberg-Derczynski

Machine learning on sets towards sequential output is an important and ubiquitous task, with applications ranging from language modeling and meta-learning to multi-agent strategy games and power grid optimization. Combining elements of representation learning and structured prediction, its two primary challenges include obtaining a meaningful, permutation invariant set representation and subsequently utilizing this representation to output a complex target permutation. This paper provides a comprehensive introduction to the field as well as an overview of important machine learning methods tackling both of these key challenges, with a detailed qualitative comparison of selected model architectures.

CLMay 7, 2020
The Danish Gigaword Project

Leon Strømberg-Derczynski, Manuel R. Ciosici, Rebekah Baglini et al.

Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialects.