B. N. Kausik

LG
h-index1
3papers
6citations
Novelty52%
AI Score24

3 Papers

LGApr 21, 2022
Accelerating Machine Learning via the Weber-Fechner Law

B. N. Kausik

The Weber-Fechner Law observes that human perception scales as the logarithm of the stimulus. We argue that learning algorithms for human concepts could benefit from the Weber-Fechner Law. Specifically, we impose Weber-Fechner on simple neural networks, with or without convolution, via the logarithmic power series of their sorted output. Our experiments show surprising performance and accuracy on the MNIST data set within a few training iterations and limited computational resources, suggesting that Weber-Fechner can accelerate machine learning of human concepts.

LGAug 23, 2022
Psychophysical Machine Learning

B. N. Kausik

The Weber Fechner Law of psychophysics observes that human perception is logarithmic in the stimulus. We present an algorithm for incorporating the Weber Fechner law into loss functions for machine learning, and use the algorithm to enhance the performance of deep learning networks.

CLFeb 22, 2024
Scaling Efficient LLMs

B. N. Kausik

Trained LLMs in the transformer architecture are typically sparse in that most of the parameters are negligible, raising questions on efficiency. Furthermore, the so called "AI scaling law" for transformers suggests that the number of parameters must scale linearly with the size of the data. In response, we inquire into efficient LLMs, i.e. those with the fewest parameters that achieve the desired accuracy on a training corpus. Specifically, by comparing theoretical and empirical estimates of the Kullback-Liebler divergence, we derive a natural AI scaling law that the number of parameters in an efficient LLM scales as $D^γ$ where $D$ is the size of the training data and $ γ\in [0.44, 0.72]$, suggesting the existence of more efficient architectures. Against this backdrop, we propose recurrent transformers, combining the efficacy of transformers with the efficiency of recurrent networks, progressively applying a single transformer layer to a fixed-width sliding window across the input sequence. Recurrent transformers (a) run in linear time in the sequence length, (b) are memory-efficient and amenable to parallel processing in large batches, (c) learn to forget history for language tasks, or accumulate history for long range tasks like copy and selective copy, and (d) are amenable to curriculum training to overcome vanishing gradients. In our experiments, we find that recurrent transformers perform favorably on benchmark tests.