39.1SIJun 2
Structural properties of the implicit function defined by an integral self-consistency equationIvan Viakhirev
We study the integral equation $\int_0^m ηρ(η)/(C-η)\,dη= 1$ with $C>m$, where $ρ$ is a $C^1$ probability density on $[0,M]$ vanishing polynomially at $η=M$. Setting $\mathcal{I}^+(m) := \lim_{C \downarrow m}\int_0^m ηρ(η)/(C-η)\,dη$ and $Ω:= \{m \in (0,M) : \mathcal{I}^+(m) > 1\}$, the equation determines $C$ implicitly as a function of $m$ on $Ω$, and our object of study is the dimensionless ratio $β(m) := C(m)/m$. Writing $h(η) := ηρ(η)$, our main theorem establishes openness of $Ω$, $C^1$-smoothness of $β$, a sign formula identifying $β'(m)$ with a positively-weighted integral of $dh/d\lnη$, transfer of monotonicity from $h$ to $β$, and existence of an interior critical point of $β$ when $h$ is unimodal and two technical hypotheses hold. Numerically, $β$ has a single critical point in seven log-concave test densities (mostly Beta-type), in support of a separate uniqueness conjecture. A bimodal density that violates both unimodality and log-concavity exhibits three critical points; this shows that dropping the two hypotheses jointly admits multiple critical points, but does not separate their roles.
13.0SDApr 21
Interpreting Multi-Branch Anti-Spoofing Architectures: Correlating Internal Strategy with Empirical PerformanceIvan Viakhirev, Kirill Borodin, Mikhail Gorodnichev et al.
Multi-branch deep neural networks like AASIST3 achieve state-of-the-art comparable performance in audio anti-spoofing, yet their internal decision dynamics remain opaque compared to traditional input-level saliency methods. While existing interpretability efforts largely focus on visualizing input artifacts, the way individual architectural branches cooperate or compete under different spoofing attacks is not well characterized. This paper develops a framework for interpreting AASIST3 at the component level. Intermediate activations from fourteen branches and global attention modules are modeled with covariance operators whose leading eigenvalues form low-dimensional spectral signatures. These signatures train a CatBoost meta-classifier to generate TreeSHAP-based branch attributions, which we convert into normalized contribution shares and confidence scores (Cb) to quantify the model's operational strategy. By analyzing 13 spoofing attacks from the ASVspoof 2019 benchmark, we identify four operational archetypes-ranging from Effective Specialization (e.g., A09, Equal Error Rate (EER) 0.04%, C=1.56) to Ineffective Consensus (e.g., A08, EER 3.14%, C=0.33). Crucially, our analysis exposes a Flawed Specialization mode where the model places high confidence in an incorrect branch, leading to severe performance degradation for attacks A17 and A18 (EER 14.26% and 28.63%, respectively). These quantitative findings link internal architectural strategy directly to empirical reliability, highlighting specific structural dependencies that standard performance metrics overlook.
71.4LGMar 31
From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model ScalesIvan Viakhirev, Kirill Borodin, Grach Mkrtchian
Hallucinations in large ASR models present a critical safety risk. In this work, we propose the \textit{Spectral Sensitivity Theorem}, which predicts a phase transition in deep networks from a dispersive regime (signal decay) to an attractor regime (rank-1 collapse) governed by layer-wise gain and alignment. We validate this theory by analyzing the eigenspectra of activation graphs in Whisper models (Tiny to Large-v3-Turbo) under adversarial stress. Our results confirm the theoretical prediction: intermediate models exhibit \textit{Structural Disintegration} (Regime I), characterized by a $13.4\%$ collapse in Cross-Attention rank. Conversely, large models enter a \textit{Compression-Seeking Attractor} state (Regime II), where Self-Attention actively compresses rank ($-2.34\%$) and hardens the spectral slope, decoupling the model from acoustic evidence.