LGJun 13, 2018Code
Polynomial Regression As an Alternative to Neural NetsXi Cheng, Bohdan Khomtchouk, Norman Matloff et al.
Despite the success of neural networks (NNs), there is still a concern among many over their "black box" nature. Why do they work? Here we present a simple analytic argument that NNs are in fact essentially polynomial regression models. This view will have various implications for NNs, e.g. providing an explanation for why convergence problems arise in NNs, and it gives rough guidance on avoiding overfitting. In addition, we use this phenomenon to predict and confirm a multicollinearity property of NNs not previously reported in the literature. Most importantly, given this loose correspondence, one may choose to routinely use polynomial models instead of NNs, thus avoiding some major problems of the latter, such as having to set many tuning parameters and dealing with convergence issues. We present a number of empirical results; in each case, the accuracy of the polynomial approach matches or exceeds that of NN approaches. A many-featured, open-source software package, polyreg, is available.
LGJan 28, 2024
Contrastive Learning and Mixture of Experts Enables Precise Vector EmbeddingsLogan Hallee, Rohan Kapur, Arjun Patel et al.
The advancement of transformer neural networks has significantly elevated the capabilities of sentence similarity models, but they still struggle with highly discriminative tasks and may produce sub-optimal representations of important documents like scientific literature. With the increased reliance on retrieval augmentation and search, representing diverse documents as concise and descriptive vectors is crucial. This paper improves upon the vectors embeddings of scientific text by assembling niche datasets using co-citations as a similarity metric, focusing on biomedical domains. We apply a novel Mixture of Experts (MoE) extension pipeline to pretrained BERT models, where every multi-layer perceptron section is enlarged and copied into multiple distinct experts. Our MoE variants perform well over $N$ scientific domains with $N$ dedicated experts, whereas standard BERT models excel in only one domain at a time. Notably, extending just a single transformer block to MoE captures 85% of the benefit seen from full MoE extension at every layer. This holds promise for versatile and efficient One-Size-Fits-All transformer networks for numerically representing diverse inputs. Our methodology marks advancements in representation learning and holds promise for enhancing vector database search and compilation.
CLNov 30, 2018
Modeling natural language emergence with integral transform theory and reinforcement learningBohdan Khomtchouk, Shyam Sudhakaran
Zipf's law predicts a power-law relationship between word rank and frequency in language communication systems and has been widely reported in a variety of natural language processing applications. However, the emergence of natural language is often modeled as a function of bias between speaker and listener interests, which lacks a direct way of relating information-theoretic bias to Zipfian rank. A function of bias also serves as an unintuitive interpretation of the communicative effort exchanged between a speaker and a listener. We counter these shortcomings by proposing a novel integral transform and kernel for mapping communicative bias functions to corresponding word frequency-rank representations at any arbitrary phase transition point, resulting in a direct way to link communicative effort (modeled by speaker/listener bias) to specific vocabulary used (represented by word rank). We demonstrate the practical utility of our integral transform by showing how a change from bias to rank results in greater accuracy and performance at an image classification task for assigning word labels to images randomly subsampled from CIFAR10. We model this task as a reinforcement learning game between a speaker and listener and compare the relative impact of bias and Zipfian word rank on communicative performance (and accuracy) between the two agents.