Sergey Alekseev

LG
3papers
34citations
Novelty40%
AI Score37

3 Papers

13.2LGApr 13
Subcritical Signal Propagation at Initialization in Normalization-Free Transformers

Sergey Alekseev

We study signal propagation at initialization in transformers through the averaged partial Jacobian norm (APJN), a measure of gradient amplification across layers. We extend APJN analysis to transformers with bidirectional attention and permutation-symmetric input token configurations by deriving recurrence relations for activation statistics and APJNs across layers. Our theory predicts how attention modifies the asymptotic behavior of the APJN at large depth and matches APJNs measured in deep vision transformers. The criticality picture known from residual networks carries over to transformers: the pre-LayerNorm architecture exhibits power-law APJN growth, whereas transformers with LayerNorm replaced by elementwise $\tanh$-like nonlinearities have stretched-exponential APJN growth, indicating that the latter are subcritical. Applied to Dynamic Tanh (DyT) and Dynamic erf (Derf) transformers, the theory explains why these architectures can be more sensitive to initialization and optimization choices and require careful tuning for stable training.

LGDec 1, 2020
Problems of representation of electrocardiograms in convolutional neural networks

Iana Sereda, Sergey Alekseev, Aleksandra Koneva et al.

Using electrocardiograms as an example, we demonstrate the characteristic problems that arise when modeling one-dimensional signals containing inaccurate repeating pattern by means of standard convolutional networks. We show that these problems are systemic in nature. They are due to how convolutional networks work with composite objects, parts of which are not fixed rigidly, but have significant mobility. We also demonstrate some counterintuitive effects related to generalization in deep networks.

LGDec 26, 2018
ECG Segmentation by Neural Networks: Errors and Correction

Iana Sereda, Sergey Alekseev, Aleksandra Koneva et al.

In this study we examined the question of how error correction occurs in an ensemble of deep convolutional networks, trained for an important applied problem: segmentation of Electrocardiograms(ECG). We also explore the possibility of using the information about ensemble errors to evaluate a quality of data representation, built by the network. This possibility arises from the effect of distillation of outliers, which was demonstarted for the ensemble, described in this paper.