A stepped sampling method for video detection using LSTM
This is an incremental improvement for video detection tasks using LSTM models.
The authors tackled the problem of improving temporal information fusion in LSTM models for video detection by proposing a stepped sampling method based on repeated input, resulting in faster training loss convergence, more stable loss after convergence, and higher test accuracy compared to traditional PyTorch samplers.
Artificial neural networks that simulate human achieves great successes. From the perspective of simulating human memory method, we propose a stepped sampler based on the "repeated input". We repeatedly inputted data to the LSTM model stepwise in a batch. The stepped sampler is used to strengthen the ability of fusing the temporal information in LSTM. We tested the stepped sampler on the LSTM built-in in PyTorch. Compared with the traditional sampler of PyTorch, such as sequential sampler, batch sampler, the training loss of the proposed stepped sampler converges faster in the training of the model, and the training loss after convergence is more stable. Meanwhile, it can maintain a higher test accuracy. We quantified the algorithm of the stepped sampler. We assume that, the artificial neural networks have human-like characteristics, and human learning method could be used for machine learning.