Henrique K. Miyamoto

3.3LGOct 28, 2022

The Fisher-Rao Loss for Learning under Label Noise

Henrique K. Miyamoto, Fábio C. C. Meneghetti, Sueli I. R. Costa

Choosing a suitable loss function is essential when learning by empirical risk minimisation. In many practical cases, the datasets used for training a classifier may contain incorrect labels, which prompts the interest for using loss functions that are inherently robust to label noise. In this paper, we study the Fisher-Rao loss function, which emerges from the Fisher-Rao distance in the statistical manifold of discrete distributions. We derive an upper bound for the performance degradation in the presence of label noise, and analyse the learning speed of this loss. Comparing with other commonly used losses, we argue that the Fisher-Rao loss provides a natural trade-off between robustness and training dynamics. Numerical experiments with synthetic and MNIST datasets illustrate this performance.

1.0ITJun 25

Mismatched Exponents for Deterministic and Randomised Noise-Guessing Decoding

Henrique K. Miyamoto, Richard Combes, Sheng Yang

We study both the deterministic and randomised variants of noise-guessing decoding in additive memoryless channels. The error and complexity exponents of such decoding schemes are analysed under mismatched decoding metrics, and then specialised to matched, $α$-tilted, and universal decoding metrics. The $α$-tilted metric is proportional to the $α$-th power ($α>0$) of the true noise distribution. In deterministic decoding, the tilting operation does not affect the performance: all these metrics are equivalent to the matched one ($α=1$), and are optimal for both average error and complexity. On the other hand, in randomised decoding, the matched metric is not optimal for complexity exponents; we show that the decoder needs to tune the parameter~$α$ according to the code rate in order to simultaneously achieve both optimal exponents using a decoding metric in that family. Finally, a universal decoding metric based on the empirical entropy of the noise sequence achieves both optimal exponents, independently of the channel law and uniformly over code rates, for the deterministic and randomised variants.

Henrique K. Miyamoto

2 Papers