LGApr 21, 2022
Accelerating Machine Learning via the Weber-Fechner LawB. N. Kausik
The Weber-Fechner Law observes that human perception scales as the logarithm of the stimulus. We argue that learning algorithms for human concepts could benefit from the Weber-Fechner Law. Specifically, we impose Weber-Fechner on simple neural networks, with or without convolution, via the logarithmic power series of their sorted output. Our experiments show surprising performance and accuracy on the MNIST data set within a few training iterations and limited computational resources, suggesting that Weber-Fechner can accelerate machine learning of human concepts.
CLFeb 22, 2024
Scaling Efficient LLMsB. N. Kausik
Trained LLMs in the transformer architecture are typically sparse in that most of the parameters are negligible, raising questions on efficiency. Furthermore, the so called "AI scaling law" for transformers suggests that the number of parameters must scale linearly with the size of the data. In response, we inquire into efficient LLMs, i.e. those with the fewest parameters that achieve the desired accuracy on a training corpus. Specifically, by comparing theoretical and empirical estimates of the Kullback-Liebler divergence, we derive a natural AI scaling law that the number of parameters in an efficient LLM scales as $D^γ$ where $D$ is the size of the training data and $ γ\in [0.44, 0.72]$, suggesting the existence of more efficient architectures. Against this backdrop, we propose recurrent transformers, combining the efficacy of transformers with the efficiency of recurrent networks, progressively applying a single transformer layer to a fixed-width sliding window across the input sequence. Recurrent transformers (a) run in linear time in the sequence length, (b) are memory-efficient and amenable to parallel processing in large batches, (c) learn to forget history for language tasks, or accumulate history for long range tasks like copy and selective copy, and (d) are amenable to curriculum training to overcome vanishing gradients. In our experiments, we find that recurrent transformers perform favorably on benchmark tests.