Riccardo Simionato

SD
h-index3
3papers
8citations
Novelty40%
AI Score22

3 Papers

SDAug 22, 2024
Modeling Time-Variant Responses of Optical Compressors with Selective State Space Models

Riccardo Simionato, Stefano Fasciani

This paper presents a method for modeling optical dynamic range compressors using deep neural networks with Selective State Space models. The proposed approach surpasses previous methods based on recurrent layers by employing a Selective State Space block to encode the input audio. It features a refined technique integrating Feature-wise Linear Modulation and Gated Linear Units to adjust the network dynamically, conditioning the compression's attack and release phases according to external parameters. The proposed architecture is well-suited for low-latency and real-time applications, crucial in live audio processing. The method has been validated on the analog optical compressors TubeTech CL 1B and Teletronix LA-2A, which possess distinct characteristics. Evaluation is performed using quantitative metrics and subjective listening tests, comparing the proposed method with other state-of-the-art models. Results show that our black-box modeling methods outperform all others, achieving accurate emulation of the compression process for both seen and unseen settings during training. We further show a correlation between this accuracy and the sampling density of the control parameters in the dataset and identify settings with fast attack and slow release as the most challenging to emulate.

SDSep 10, 2024
Sines, Transient, Noise Neural Modeling of Piano Notes

Riccardo Simionato, Stefano Fasciani

This paper introduces a novel method for emulating piano sounds. We propose to exploit the sines, transient, and noise decomposition to design a differentiable spectral modeling synthesizer replicating piano notes. Three sub-modules learn these components from piano recordings and generate the corresponding harmonic, transient, and noise signals. Splitting the emulation into three independently trainable models reduces the modeling tasks' complexity. The quasi-harmonic content is produced using a differentiable sinusoidal model guided by physics-derived formulas, whose parameters are automatically estimated from audio recordings. The noise sub-module uses a learnable time-varying filter, and the transients are generated using a deep convolutional network. From singular notes, we emulate the coupling between different keys in trichords with a convolutional-based network. Results show the model matches the partial distribution of the target while predicting the energy in the higher part of the spectrum presents more challenges. The energy distribution in the spectra of the transient and noise components is accurate overall. While the model is more computationally and memory efficient, perceptual tests reveal limitations in accurately modeling the attack phase of notes. Despite this, it generally achieves perceptual accuracy in emulating single notes and trichords.

SDMay 7, 2024
Comparative Study of State-based Neural Networks for Virtual Analog Audio Effects Modeling

Riccardo Simionato, Stefano Fasciani

Artificial neural networks are a promising technique for virtual analog modeling, having shown particular success in emulating distortion circuits. Despite their potential, enhancements are needed to enable effect parameters to influence the network's response and to achieve a low-latency output. While hybrid solutions, which incorporate both analytical and black-box techniques, offer certain advantages, black-box approaches, such as neural networks, can be preferable in contexts where rapid deployment, simplicity, or adaptability are required, and where understanding the internal mechanisms of the system is less critical. In this article, we explore the application of recent machine learning advancements for virtual analog modeling. We compare State-Space models and Linear Recurrent Units against the more common LSTM networks, with a variety of audio effects. We evaluate the performance and limitations of these models using multiple metrics, providing insights for future research and development. Our metrics aim to assess the models' ability to accurately replicate the signal's energy and frequency contents, with a particular focus on transients. The Feature-wise Linear Modulation method is employed to incorporate effect parameters that influence the network's response, enabling dynamic adaptability based on specified conditions. Experimental results suggest that LSTM networks offer an advantage in emulating distortions and equalizers, although performance differences are sometimes subtle yet statistically significant. On the other hand, encoder-decoder configurations of Long Short-Term Memory networks and State-Space models excel in modeling saturation and compression, effectively managing the dynamic aspects inherent in these effects. However, no models effectively emulate the low-pass filter, and Linear Recurrent Units show inconsistent performance across various audio effects.