Zhifeng Deng

15.0DGJun 4

The Exponential of Skew-Symmetric Matrices: A Nearby Inverse and Efficient Computation of Derivatives

Zhifeng Deng, P. -A. Absil, Kyle A. Gallivan et al.

The matrix exponential restricted to skew-symmetric matrices has numerous applications, notably in view of its interpretation as the Lie group exponential and Riemannian exponential for the special orthogonal group. We characterize the invertibility of the derivative of the skew-restricted exponential, thereby providing a simple expression of the tangent conjugate locus of the orthogonal group. In view of the skew restriction, this characterization differs from the classic result on the invertibility of the derivative of the exponential of real matrices. Based on this characterization, for every skew-symmetric matrix $A$ outside the (zero-measure) tangent conjugate locus, we explicitly construct the domain and image of a smooth inverse -- which we term \emph{nearby logarithm} -- of the skew-restricted exponential around $A$. This nearby logarithm reduces to the classic principal logarithm of special orthogonal matrices when $A=\mathbf{0}$. The symbolic formulae for the differentiation and its inverse are derived and implemented efficiently. The extensive numerical experiments show that the proposed formulae are up to $3.9$-times and $3.6$-times faster than the current state-of-the-art robust formulae for the differentiation and its inversion, respectively.

CLJul 22, 2025Code

TTS-1 Technical Report

Oleg Atamanenko, Anna Chalova, Joseph Coombes et al.

We introduce Inworld TTS-1, a set of two Transformer-based autoregressive text-to-speech (TTS) models. Our largest model, TTS-1-Max, has 8.8B parameters and is designed for utmost quality and expressiveness in demanding applications. TTS-1 is our most efficient model, with 1.6B parameters, built for real-time speech synthesis and on-device use cases. By scaling train-time compute and applying a sequential process of pre-training, fine-tuning, and RL-alignment of the speech-language model (SpeechLM) component, both models achieve state-of-the-art performance on a variety of benchmarks, demonstrating exceptional quality relying purely on in-context learning of the speaker's voice. Inworld TTS-1 and TTS-1-Max can generate high-resolution 48 kHz speech with low latency, and support 11 languages with fine-grained emotional control and non-verbal vocalizations through audio markups. We additionally open-source our training and modeling code under an MIT license.

Zhifeng Deng

2 Papers