Matthias Wolff

h-index16

6papers

29citations

Novelty44%

AI Score24

Ranked #172,807 of 194,257 authors (top 89%)#28,808 in CL (top 94%)

6 Papers

5.3LGNov 13, 2023

How to Do Machine Learning with Small Data? -- A Review from an Industrial Perspective

Ivan Kraljevski, Yong Chul Ju, Dmitrij Ivanov et al.

Artificial intelligence experienced a technological breakthrough in science, industry, and everyday life in the recent few decades. The advancements can be credited to the ever-increasing availability and miniaturization of computational resources that resulted in exponential data growth. However, because of the insufficient amount of data in some cases, employing machine learning in solving complex tasks is not straightforward or even possible. As a result, machine learning with small data experiences rising importance in data science and application in several fields. The authors focus on interpreting the general term of "small data" and their engineering and industrial application role. They give a brief overview of the most important industrial applications of machine learning and small data. Small data is defined in terms of various characteristics compared to big data, and a machine learning formalism was introduced. Five critical challenges of machine learning with small data in industrial applications are presented: unlabeled data, imbalanced data, missing data, insufficient data, and rare events. Based on those definitions, an overview of the considerations in domain representation and data acquisition is given along with a taxonomy of machine learning approaches in the context of small data.

0.5CLNov 3, 2023

Minimalist Grammar: Construction without Overgeneration

Isidor Konrad Maier, Johannes Kuhn, Jesse Beisegel et al.

In this paper we give instructions on how to write a minimalist grammar (MG). In order to present the instructions as an algorithm, we use a variant of context free grammars (CFG) as an input format. We can exclude overgeneration, if the CFG has no recursion, i.e. no non-terminal can (indirectly) derive to a right-hand side containing itself. The constructed MGs utilize licensors/-ees as a special way of exception handling. A CFG format for a derivation $A\_eats\_B\mapsto^* peter\_eats\_apples$, where $A$ and $B$ generate noun phrases, normally leads to overgeneration, e.\,g., $i\_eats\_apples$. In order to avoid overgeneration, a CFG would need many non-terminal symbols and rules, that mainly produce the same word, just to handle exceptions. In our MGs however, we can summarize CFG rules that produce the same word in one item and handle exceptions by a proper distribution of licensees/-ors. The difficulty with this technique is that in most generations the majority of licensees/-ors is not needed, but still has to be triggered somehow. We solve this problem with $ε$-items called \emph{adapters}.

0.5CLDec 14, 2023

Arithmetics-Based Decomposition of Numeral Words -- Arithmetic Conditions give the Unpacking Strategy

Isidor Konrad Maier, Matthias Wolff

This paper presents a novel numeral decomposer based on arithmetic criteria. The criteria are not dependent on a base-10 assumption but only on Hurford's Packing Strategy. Hurford's Packing Strategy constitutes numerals by packing factors and summands to multiplicators. We found out that a numeral of value n has a multiplicator larger than sqrt(n), a summand smaller than n/2 and a factor smaller than sqrt(n). Using these findings, the numeral decomposer attempts to detect and unpack factors and summand in order to reverse Hurford's Packing strategy. We tested its applicability for incremental unsupervised grammar induction in 273 languages. This way, grammars were obtained with sensible mathematical attributes that explain the structure of produced numerals. The numeral-decomposer-induced grammars are often close to expert-made and more compact than numeral grammars induced by a modern state-of-the-art grammar induction tool. Furthermore, this paper contains a report about the few cases of incorrect induced mathematical attributes, which are often linked to linguistic peculiarities like context sensitivity.

3.3FLApr 30, 2020

Reinforcement learning of minimalist grammars

Peter beim Graben, Ronald Römer, Werner Meyer et al.

Speech-controlled user interfaces facilitate the operation of devices and household functions to laymen. State-of-the-art language technology scans the acoustically analyzed speech signal for relevant keywords that are subsequently inserted into semantic slots to interpret the user's intent. In order to develop proper cognitive information and communication technologies, simple slot-filling should be replaced by utterance meaning transducers (UMT) that are based on semantic parsers and a mental lexicon, comprising syntactic, phonetic and semantic features of the language under consideration. This lexicon must be acquired by a cognitive agent during interaction with its users. We outline a reinforcement learning algorithm for the acquisition of syntax and semantics of English utterances, based on minimalist grammar (MG), a recent computational implementation of generative linguistics. English declarative sentences are presented to the agent by a teacher in form of utterance meaning pairs (UMP) where the meanings are encoded as formulas of predicate logic. Since MG codifies universal linguistic competence through inference rules, thereby separating innate linguistic knowledge from the contingently acquired lexicon, our approach unifies generative grammar and reinforcement learning, hence potentially resolving the still pending Chomsky-Skinner controversy.

0.5CLMar 11, 2020

Vector symbolic architectures for context-free grammars

Peter beim Graben, Markus Huber, Werner Meyer et al.

Background / introduction. Vector symbolic architectures (VSA) are a viable approach for the hyperdimensional representation of symbolic data, such as documents, syntactic structures, or semantic frames. Methods. We present a rigorous mathematical framework for the representation of phrase structure trees and parse trees of context-free grammars (CFG) in Fock space, i.e. infinite-dimensional Hilbert space as being used in quantum field theory. We define a novel normal form for CFG by means of term algebras. Using a recently developed software toolbox, called FockBox, we construct Fock space representations for the trees built up by a CFG left-corner (LC) parser. Results. We prove a universal representation theorem for CFG term algebras in Fock space and illustrate our findings through a low-dimensional principal component projection of the LC parser states. Conclusions. Our approach could leverage the development of VSA for explainable artificial intelligence (XAI) by means of hyperdimensional deep neural computation. It could be of significance for the improvement of cognitive user interfaces and other applications of VSA in machine learning.

0.6CLJun 11, 2019

Reinforcement Learning of Minimalist Numeral Grammars

Peter beim Graben, Ronald Römer, Werner Meyer et al.

Speech-controlled user interfaces facilitate the operation of devices and household functions to laymen. State-of-the-art language technology scans the acoustically analyzed speech signal for relevant keywords that are subsequently inserted into semantic slots to interpret the user's intent. In order to develop proper cognitive information and communication technologies, simple slot-filling should be replaced by utterance meaning transducers (UMT) that are based on semantic parsers and a \emph{mental lexicon}, comprising syntactic, phonetic and semantic features of the language under consideration. This lexicon must be acquired by a cognitive agent during interaction with its users. We outline a reinforcement learning algorithm for the acquisition of the syntactic morphology and arithmetic semantics of English numerals, based on minimalist grammar (MG), a recent computational implementation of generative linguistics. Number words are presented to the agent by a teacher in form of utterance meaning pairs (UMP) where the meanings are encoded as arithmetic terms from a suitable term algebra. Since MG encodes universal linguistic competence through inference rules, thereby separating innate linguistic knowledge from the contingently acquired lexicon, our approach unifies generative grammar and reinforcement learning, hence potentially resolving the still pending Chomsky-Skinner controversy.