M.T. Manry

h-index27

5papers

33citations

Novelty45%

AI Score26

Ranked #158,871 of 194,257 authors (top 82%)#34,827 in LG (top 87%)

5 Papers

3.8LGMar 30, 2023

Optimal Input Gain: All You Need to Supercharge a Feed-Forward Neural Network

Chinmay Rane, Kanishka Tyagi, Sanjeev Malalur et al.

Linear transformation of the inputs alters the training performance of feed-forward networks that are otherwise equivalent. However, most linear transforms are viewed as a pre-processing operation separate from the actual training. Starting from equivalent networks, it is shown that pre-processing inputs using linear transformation are equivalent to multiplying the negative gradient matrix with an autocorrelation matrix per training iteration. Second order method is proposed to find the autocorrelation matrix that maximizes learning in a given iteration. When the autocorrelation matrix is diagonal, the method optimizes input gains. This optimal input gain (OIG) approach is used to improve two first-order two-stage training algorithms, namely back-propagation (BP) and hidden weight optimization (HWO), which alternately update the input weights and solve linear equations for output weights. Results show that the proposed OIG approach greatly enhances the performance of the first-order algorithms, often allowing them to rival the popular Levenberg-Marquardt approach with far less computation. It is shown that HWO is equivalent to BP with Whitening transformation applied to the inputs. HWO effectively combines Whitening transformation with learning. Thus, OIG improved HWO could be a significant building block to more complex deep learning architectures.

2.0LGAug 11, 2023

Automated Sizing and Training of Efficient Deep Autoencoders using Second Order Algorithms

Kanishka Tyagi, Chinmay Rane, Michael Manry

We propose a multi-step training method for designing generalized linear classifiers. First, an initial multi-class linear classifier is found through regression. Then validation error is minimized by pruning of unnecessary inputs. Simultaneously, desired outputs are improved via a method similar to the Ho-Kashyap rule. Next, the output discriminants are scaled to be net functions of sigmoidal output units in a generalized linear classifier. We then develop a family of batch training algorithm for the multi layer perceptron that optimizes its hidden layer size and number of training epochs. Next, we combine pruning with a growing approach. Later, the input units are scaled to be the net function of the sigmoidal output units that are then feed into as input to the MLP. We then propose resulting improvements in each of the deep learning blocks thereby improving the overall performance of the deep architecture. We discuss the principles and formulation regarding learning algorithms for deep autoencoders. We investigate several problems in deep autoencoders networks including training issues, the theoretical, mathematical and experimental justification that the networks are linear, optimizing the number of hidden units in each layer and determining the depth of the deep learning model. A direct implication of the current work is the ability to construct fast deep learning models using desktop level computational resources. This, in our opinion, promotes our design philosophy of building small but powerful algorithms. Performance gains are demonstrated at each step. Using widely available datasets, the final network's ten fold testing error is shown to be less than that of several other linear, generalized linear classifiers, multi layer perceptron and deep learners reported in the literature.

2.0LGAug 10, 2023

Optimizing Performance of Feedforward and Convolutional Neural Networks through Dynamic Activation Functions

Chinmay Rane, Kanishka Tyagi, Michael Manry

Deep learning training training algorithms are a huge success in recent years in many fields including speech, text,image video etc. Deeper and deeper layers are proposed with huge success with resnet structures having around 152 layers. Shallow convolution neural networks(CNN's) are still an active research, where some phenomena are still unexplained. Activation functions used in the network are of utmost importance, as they provide non linearity to the networks. Relu's are the most commonly used activation function.We show a complex piece-wise linear(PWL) activation in the hidden layer. We show that these PWL activations work much better than relu activations in our networks for convolution neural networks and multilayer perceptrons. Result comparison in PyTorch for shallow and deep CNNs are given to further strengthen our case.

2.6LGNov 18, 2024

Making Sigmoid-MSE Great Again: Output Reset Challenges Softmax Cross-Entropy in Neural Network Classification

Kanishka Tyagi, Chinmay Rane, Ketaki Vaidya et al.

This study presents a comparative analysis of two objective functions, Mean Squared Error (MSE) and Softmax Cross-Entropy (SCE) for neural network classification tasks. While SCE combined with softmax activation is the conventional choice for transforming network outputs into class probabilities, we explore an alternative approach using MSE with sigmoid activation. We introduce the Output Reset algorithm, which reduces inconsistent errors and enhances classifier robustness. Through extensive experiments on benchmark datasets (MNIST, CIFAR-10, and Fashion-MNIST), we demonstrate that MSE with sigmoid activation achieves comparable accuracy and convergence rates to SCE, while exhibiting superior performance in scenarios with noisy data. Our findings indicate that MSE, despite its traditional association with regression tasks, serves as a viable alternative for classification problems, challenging conventional wisdom about neural network training strategies.

3.3INS-DETMar 31, 2020

Efficient Machine Learning Approach for Optimizing the Timing Resolution of a High Purity Germanium Detector

R. W. Gladen, V. A. Chirayath, A. J. Fairchild et al.

We describe here an efficient machine-learning based approach for the optimization of parameters used for extracting the arrival time of waveforms, in particular those generated by the detection of 511 keV annihilation gamma-rays by a 60 cm3 coaxial high purity germanium detector (HPGe). The method utilizes a type of artificial neural network (ANN) called a self-organizing map (SOM) to cluster the HPGe waveforms based on the shape of their rising edges. The optimal timing parameters for HPGe waveforms belonging to a particular cluster are found by minimizing the time difference between the HPGe signal and a signal produced by a BaF2 scintillation detector. Applying these variable timing parameters to the HPGe signals achieved a gamma-coincidence timing resolution of ~ 4.3 ns at the 511 keV photo peak (defined as 511 +- 50 keV) and a timing resolution of ~ 6.5 ns for the entire gamma spectrum--without rejecting any valid pulses. This timing resolution approaches the best obtained by analog nuclear electronics, without the corresponding complexities of analog optimization procedures. We further demonstrate the universality and efficacy of the machine learning approach by applying the method to the generation of secondary electron time-of-flight spectra following the implantation of energetic positrons on a sample.