Darima Mylzenova

CV
3papers
1citation
Novelty28%
AI Score40

3 Papers

62.0SDApr 1Code
PFluxTTS: Hybrid Flow-Matching TTS with Robust Cross-Lingual Voice Cloning and Inference-Time Model Fusion

Vikentii Pankov, Artem Gribul, Oktai Tatanov et al.

We present PFluxTTS, a hybrid text-to-speech system addressing three gaps in flow-matching TTS: the stability-naturalness trade-off, weak cross-lingual voice cloning, and limited audio quality from low-rate mel features. Our contributions are: (1) a dual-decoder design combining duration-guided and alignment-free models through inference-time vector-field fusion; (2) robust cloning using a sequence of speech-prompt embeddings in a FLUX-based decoder, preserving speaker traits across languages without prompt transcripts; and (3) a modified PeriodWave vocoder with super-resolution to 48 kHz. On cross-lingual in-the-wild data, PFluxTTS clearly outperforms F5-TTS, FishSpeech, and SparkTTS, matches ChatterBox in naturalness (MOS 4.11) while achieving 23% lower WER (6.9% vs. 9.0%), and surpasses ElevenLabs in speaker similarity (+0.32 SMOS). The system remains robust in challenging scenarios where most open-source models fail, while requiring only short reference audio and no extra training. Audio demos are available at https://braskai.github.io/pfluxtts/

CVJan 10, 2020Code
Seismic horizon detection with neural networks

Alexander Koryagin, Darima Mylzenova, Roman Khudorozhkov et al.

Over the last few years, Convolutional Neural Networks (CNNs) were successfully adopted in numerous domains to solve various image-related tasks, ranging from simple classification to fine borders annotation. Tracking seismic horizons is no different, and there are a lot of papers proposing the usage of such models to avoid time-consuming hand-picking. Unfortunately, most of them are (i) either trained on synthetic data, which can't fully represent the complexity of subterranean structures, (ii) trained and tested on the same cube, or (iii) lack reproducibility and precise descriptions of the model-building process. With all that in mind, the main contribution of this paper is an open-sourced research of applying binary segmentation approach to the task of horizon detection on multiple real seismic cubes with a focus on inter-cube generalization of the predictive model.

GEO-PHJan 10, 2020Code
SeismiQB -- a novel framework for deep learning with seismic data

Alexander Koryagin, Roman Khudorozhkov, Sergey Tsimfer et al.

In recent years, Deep Neural Networks were successfully adopted in numerous domains to solve various image-related tasks, ranging from simple classification to fine borders annotation. Naturally, many researches proposed to use it to solve geological problems. Unfortunately, many of the seismic processing tools were developed years before the era of machine learning, including the most popular SEG-Y data format for storing seismic cubes. Its slow loading speed heavily hampers experimentation speed, which is essential for getting acceptable results. Worse yet, there is no widely-used format for storing surfaces inside the volume (for example, seismic horizons). To address these problems, we've developed an open-sourced Python framework with emphasis on working with neural networks, that provides convenient tools for (i) fast loading seismic cubes in multiple data formats and converting between them, (ii) generating crops of desired shape and augmenting them with various transformations, and (iii) pairing cube data with labeled horizons or other types of geobodies.