Steven Van Kuyk

3papers

195citations

Novelty40%

AI Score22

Ranked #184,837 of 201,326 authors (top 92%)#1,632 in SD (top 89%)

3 Papers

MLAug 23, 2018

Caveats for information bottleneck in deterministic scenarios

Artemy Kolchinsky, Brendan D. Tracey, Steven Van Kuyk

Information bottleneck (IB) is a method for extracting information from one random variable $X$ that is relevant for predicting another random variable $Y$. To do so, IB identifies an intermediate "bottleneck" variable $T$ that has low mutual information $I(X;T)$ and high mutual information $I(Y;T)$. The "IB curve" characterizes the set of bottleneck variables that achieve maximal $I(Y;T)$ for a given $I(X;T)$, and is typically explored by maximizing the "IB Lagrangian", $I(Y;T) - βI(X;T)$. In some cases, $Y$ is a deterministic function of $X$, including many classification problems in supervised learning where the output class $Y$ is a deterministic function of the input $X$. We demonstrate three caveats when using IB in any situation where $Y$ is a deterministic function of $X$: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of $β$; (2) there are "uninteresting" trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. We also show that when $Y$ is a small perturbation away from being a deterministic function of $X$, these three caveats arise in an approximate way. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the three caveats on the MNIST dataset.

SDAug 20, 2017

An evaluation of intrusive instrumental intelligibility metrics

Steven Van Kuyk, W. Bastiaan Kleijn, Richard C. Hendriks

Instrumental intelligibility metrics are commonly used as an alternative to listening tests. This paper evaluates 12 monaural intrusive intelligibility metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and $\text{sEPSM}^\text{corr}$. In addition, this paper investigates the ability of intelligibility metrics to generalize to new types of distortions and analyzes why the top performing metrics have high performance. The intelligibility data were obtained from 11 listening tests described in the literature. The stimuli included Dutch, Danish, and English speech that was distorted by additive noise, reverberation, competing talkers, pre-processing enhancement, and post-processing enhancement. SIIB and HASPI had the highest performance achieving a correlation with listening test scores on average of $ρ=0.92$ and $ρ=0.89$, respectively. The high performance of SIIB may, in part, be the result of SIIBs developers having access to all the intelligibility data considered in the evaluation. The results show that intelligibility metrics tend to perform poorly on data sets that were not used during their development. By modifying the original implementations of SIIB and STOI, the advantage of reducing statistical dependencies between input features is demonstrated. Additionally, the paper presents a new version of SIIB called $\text{SIIB}^\text{Gauss}$, which has similar performance to SIIB and HASPI, but takes less time to compute by two orders of magnitude.

SDAug 17, 2017

An instrumental intelligibility metric based on information theory

Steven Van Kuyk, W. Bastiaan Kleijn, Richard C. Hendriks

We propose a monaural intrusive instrumental intelligibility metric called speech intelligibility in bits (SIIB). SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing information theoretic intelligibility metrics, SIIB accounts for talker variability and statistical dependencies between time-frequency units. Our evaluation shows that relative to state-of-the-art intelligibility metrics, SIIB is highly correlated with the intelligibility of speech that has been degraded by noise and processed by speech enhancement algorithms.