Sidney Fels

23papers

319citations

Novelty36%

AI Score43

Ranked #82,908 of 201,326 authors (top 41%)#603 in SD (top 33%)

23 Papers

CVMay 1Code

Patient-Specific Optimization for Mandibular Reconstruction Planning with Enhanced Bone Union

Hamidreza Aftabi, John E. Lloyd, Amanda Ding et al.

Mandibular reconstruction with vascularized bone grafts is complicated by donor-host nonunion, and current virtual surgical planning produces a geometric plan rather than a configuration that explicitly promotes bone union. We present OsteoOpt++, an image-to-decision planning loop for patient-specific mandibular reconstruction. A pre-operative computed tomography (CT) is converted into a personalized digital twin through template-to-patient registration and CT-derived updates of the muscle and temporomandibular-joint parameters. Bayesian optimization with an expected-improvement-plus acquisition rule then searches six clinically controllable cut-plane and donor-positioning variables under an apposition-driven objective and a safety-factor-regularized variant. The workflow was evaluated on three generic defects (body, symphysis, and ramus-body) and a total of 3+1 patient-specific cases, with 3 used for optimization and 1 for validation. In the generic cases, against a common surgical approach, cycle-averaged donor-mandible apposition increased by up to 29 percentage points (329% relative); in the patient-specific cases, against the surgeon-implemented day-5 post-operative configuration, by up to 26 percentage points. A 10% sensitivity analysis over eleven modeling parameters capped the change in the apposition-driven objective at 3% for generic cases and 4% for patient-specific cases, and the longitudinal case showed Dice overlap of 0.70 and 0.76 between predicted apposition and year-1 bone formation. Clinically, this provides surgeons with a pre-operative, image-driven recommendation for cut-plane orientation and donor placement that is predicted to improve union conditions over the configurations currently delivered in the operating room. The optimization and patient-specific modeling code is open source at https://github.com/hamidreza-aftabi/OsteoOpt.

CVMar 23Code

OsteoFlow: Lyapunov-Guided Flow Distillation for Predicting Bone Remodeling after Mandibular Reconstruction

Hamidreza Aftabi, Faye Yu, Brooke Switzer et al.

Predicting long-term bone remodeling after mandibular reconstruction would be of great clinical benefit, yet standard generative models struggle to maintain trajectory-level consistency and anatomical fidelity over long horizons. We introduce OsteoFlow, a flow-based framework predicting Year-1 post-operative CT scans from Day-5 scans. Our core contribution is Lyapunov-guided trajectory distillation: Unlike one-step distillation, our method distills a continuous trajectory over transport time from a registration-derived stationary velocity field teacher. Combined with a resection-aware image loss, this enforces geometric correspondence without sacrificing generative capacity. Evaluated on 344 paired regions of interest, OsteoFlow significantly outperforms state of-the-art baselines, reducing mean absolute error in the surgical resection zone by ~20%. This highlights the promise of trajectory distillation for long-term prediction. Code is available on GitHub: OsteoFlow.

SDSep 26, 2023

Speech Audio Synthesis from Tagged MRI and Non-Negative Matrix Factorization via Plastic Transformer

Xiaofeng Liu, Fangxu Xing, Maureen Stone et al.

The tongue's intricate 3D structure, comprising localized functional units, plays a crucial role in the production of speech. When measured using tagged MRI, these functional units exhibit cohesive displacements and derived quantities that facilitate the complex process of speech production. Non-negative matrix factorization-based approaches have been shown to estimate the functional units through motion features, yielding a set of building blocks and a corresponding weighting map. Investigating the link between weighting maps and speech acoustics can offer significant insights into the intricate process of speech production. To this end, in this work, we utilize two-dimensional spectrograms as a proxy representation, and develop an end-to-end deep learning framework for translating weighting maps to their corresponding audio waveforms. Our proposed plastic light transformer (PLT) framework is based on directional product relative position bias and single-level spatial pyramid pooling, thus enabling flexible processing of weighting maps with variable size to fixed-size spectrograms, without input information loss or dimension expansion. Additionally, our PLT framework efficiently models the global correlation of wide matrix input. To improve the realism of our generated spectrograms with relatively limited training samples, we apply pair-wise utterance consistency with Maximum Mean Discrepancy constraint and adversarial training. Experimental results on a dataset of 29 subjects speaking two utterances demonstrated that our framework is able to synthesize speech audio waveforms from weighting maps, outperforming conventional convolution and transformer models.

SDFeb 9, 2021

A comparative study of two-dimensional vocal tract acoustic modeling based on Finite-Difference Time-Domain methods

Debasish Ray Mohapatra, Victor Zappi, Sidney Fels

The two-dimensional (2D) numerical approaches for vocal tract (VT) modelling can afford a better balance between the low computational cost and accurate rendering of acoustic wave propagation. However, they require a high spatio-temporal resolution in the numerical scheme for a precise estimation of acoustic formants at the simulation run-time expense. We have recently proposed a new VT acoustic modelling technique, known as the 2.5D Finite-Difference Time-Domain (2.5D FDTD), which extends the existing 2D FDTD approach by adding tube depth to its acoustic wave solver. In this work, first, the simulated acoustic outputs of our new model are shown to be comparable with the 2D FDTD and a realistic 3D FEM VT model at a low spatio-temporal resolution. Next, a radiation model is developed by including a circular baffle around the VT as head geometry. The transfer functions of the radiation model are analyzed using five different vocal tract shapes for vowel sounds /a/, /e/, /i/, /o/ and /u/.

SDFeb 2, 2021

SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer

Pramit Saha, Debasish Ray Mohapatra, Sidney Fels

This work presents our advancements in controlling an articulatory speech synthesis engine, \textit{viz.}, Pink Trombone, with hand gestures. Our interface translates continuous finger movements and wrist flexion into continuous speech using vocal tract area-function based articulatory speech synthesis. We use Cyberglove II with 18 sensors to capture the kinematic information of the wrist and the individual fingers, in order to control a virtual tongue. The coordinates and the bending values of the sensors are then utilized to fit a spline tongue model that smoothens out the noisy values and outliers. Considering the upper palate as fixed and the spline model as the dynamically moving lower surface (tongue) of the vocal tract, we compute 1D area functional values that are fed to the Pink Trombone, generating continuous speech sounds. Therefore, by learning to manipulate one's wrist and fingers, one can learn to produce speech sounds just through one's hands, without the need for using the vocal tract.

HCOct 27, 2020

New interfaces for musical expression

Ivan Poupyrev, Michael J. Lyons, Sidney Fels et al.

The rapid evolution of electronics, digital media, advanced materials, and other areas of technology, is opening up unprecedented opportunities for musical interface inventors and designers. The possibilities afforded by these new technologies carry with them the challenges of a complex and often confusing array of choices for musical composers and performers. New musical technologies are at least partly responsible for the current explosion of new musical forms, some of which are controversial and challenge traditional definitions of music. Alternative musical controllers, currently the leading edge of the ongoing dialogue between technology and musical culture, involve many of the issues covered at past CHI meetings. This workshop brings together interface experts interested in musical controllers and musicians and composers involved in the development of new musical interfaces.

IVJun 29, 2020

Ultra2Speech -- A Deep Learning Framework for Formant Frequency Estimation and Tracking from Ultrasound Tongue Images

Pramit Saha, Yadong Liu, Bryan Gick et al.

Thousands of individuals need surgical removal of their larynx due to critical diseases every year and therefore, require an alternative form of communication to articulate speech sounds after the loss of their voice box. This work addresses the articulatory-to-acoustic mapping problem based on ultrasound (US) tongue images for the development of a silent-speech interface (SSI) that can provide them with an assistance in their daily interactions. Our approach targets automatically extracting tongue movement information by selecting an optimal feature set from US images and mapping these features to the acoustic space. We use a novel deep learning architecture to map US tongue images from the US probe placed beneath a subject's chin to formants that we call, Ultrasound2Formant (U2F) Net. It uses hybrid spatio-temporal 3D convolutions followed by feature shuffling, for the estimation and tracking of vowel formants from US images. The formant values are then utilized to synthesize continuous time-varying vowel trajectories, via Klatt Synthesizer. Our best model achieves R-squared (R^2) measure of 99.96% for the regression task. Our network lays the foundation for an SSI as it successfully tracks the tongue contour automatically as an internal representation without any explicit annotation.

ASMay 16, 2020

Learning Joint Articulatory-Acoustic Representations with Normalizing Flows

Pramit Saha, Sidney Fels

The articulatory geometric configurations of the vocal tract and the acoustic properties of the resultant speech sound are considered to have a strong causal relationship. This paper aims at finding a joint latent representation between the articulatory and acoustic domain for vowel sounds via invertible neural network models, while simultaneously preserving the respective domain-specific features. Our model utilizes a convolutional autoencoder architecture and normalizing flow-based models to allow both forward and inverse mappings in a semi-supervised manner, between the mid-sagittal vocal tract geometry of a two degrees-of-freedom articulatory synthesizer with 1D acoustic wave model and the Mel-spectrogram representation of the synthesized speech sounds. Our approach achieves satisfactory performance in achieving both articulatory-to-acoustic as well as acoustic-to-articulatory mapping, thereby demonstrating our success in achieving a joint encoding of both the domains.

LGDec 11, 2019

Variational Learning with Disentanglement-PyTorch

Amir H. Abdi, Purang Abolmaesumi, Sidney Fels

Unsupervised learning of disentangled representations is an open problem in machine learning. The Disentanglement-PyTorch library is developed to facilitate research, implementation, and testing of new variational algorithms. In this modular library, neural architectures, dimensionality of the latent space, and the training algorithms are fully decoupled, allowing for independent and consistent experiments across variational methods. The library handles the training scheduling, logging, and visualizations of reconstructions and latent space traversals. It also evaluates the encodings based on various disentanglement metrics. The library, so far, includes implementations of the following unsupervised algorithms VAE, Beta-VAE, Factor-VAE, DIP-I-VAE, DIP-II-VAE, Info-VAE, and Beta-TCVAE, as well as conditional approaches such as CVAE and IFCVAE. The library is compatible with the Disentanglement Challenge of NeurIPS 2019, hosted on AICrowd, and achieved the 3rd rank in both the first and second stages of the challenge.

IVDec 5, 2019

A Study into Echocardiography View Conversion

Amir H. Abdi, Mohammad H. Jafari, Sidney Fels et al.

Transthoracic echo is one of the most common means of cardiac studies in the clinical routines. During the echo exam, the sonographer captures a set of standard cross sections (echo views) of the heart. Each 2D echo view cuts through the 3D cardiac geometry via a unique plane. Consequently, different views share some limited information. In this work, we investigate the feasibility of generating a 2D echo view using another view based on adversarial generative models. The objective optimized to train the view-conversion model is based on the ideas introduced by LSGAN, PatchGAN and Conditional GAN (cGAN). The size and length of the left ventricle in the generated target echo view is compared against that of the target ground-truth to assess the validity of the echo view conversion. Results show that there is a correlation of 0.50 between the LV areas and 0.49 between the LV lengths of the generated target frames and the real target frames.

LGNov 26, 2019

A Preliminary Study of Disentanglement With Insights on the Inadequacy of Metrics

Amir H. Abdi, Purang Abolmaesumi, Sidney Fels

Disentangled encoding is an important step towards a better representation learning. However, despite the numerous efforts, there still is no clear winner that captures the independent features of the data in an unsupervised fashion. In this work we empirically evaluate the performance of six unsupervised disentanglement approaches on the mpi3d toy dataset curated and released for the NeurIPS 2019 Disentanglement Challenge. The methods investigated in this work are Beta-VAE, Factor-VAE, DIP-I-VAE, DIP-II-VAE, Info-VAE, and Beta-TCVAE. The capacities of all models were progressively increased throughout the training and the hyper-parameters were kept intact across experiments. The methods were evaluated based on five disentanglement metrics, namely, DCI, Factor-VAE, IRS, MIG, and SAP-Score. Within the limitations of this study, the Beta-TCVAE approach was found to outperform its alternatives with respect to the normalized sum of metrics. However, a qualitative study of the encoded latents reveal that there is not a consistent correlation between the reported metrics and the disentanglement potential of the model.

HCSep 25, 2019

EEG-to-F0: Establishing artificial neuro-muscular pathway for kinematics-based fundamental frequency control

Himanshu Goyal, Pramit Saha, Bryan Gick et al.

The fundamental frequency (F0) of human voice is generally controlled by changing the vocal fold parameters (including tension, length and mass), which in turn is manipulated by the muscle exciters, activated by the neural synergies. In order to begin investigating the neuromuscular F0 control pathway, we simulate a simple biomechanical arm prototype (instead of an artificial vocal tract) that tends to control F0 of an artificial sound synthesiser based on the elbow movements. The intended arm movements are decoded from the EEG signal inputs (collected simultaneously with the kinematic hand data of the participant) through a combined machine learning and biomechanical modeling strategy. The machine learning model is employed to identify the muscle activation of a single-muscle arm model in ArtiSynth (from input brain signal), in order to match the actual kinematic (elbow joint angle) data . The biomechanical model utilises this estimated muscle excitation to produce corresponding changes in elbow angle, which is then linearly mapped to F0 of a vocal sound synthesiser. We use the F0 value mapped from the actual kinematic hand data (via same function) as the ground truth and compare the F0 estimated from brain signal. A detailed qualitative and quantitative performance comparison shows that the proposed neuromuscular pathway can indeed be utilised to accurately control the vocal fundamental frequency, thereby demonstrating the success of our closed loop neuro-biomechanical control scheme.

SDSep 19, 2019

An extended two-dimensional vocal tract model for fast acoustic simulation of single-axis symmetric three-dimensional tubes

Debasish Ray Mohapatra, Victor Zappi, Sidney Fels

The simulation of two-dimensional (2D) wave propagation is an affordable computational task and its use can potentially improve time performance in vocal tracts' acoustic analysis. Several models have been designed that rely on 2D wave solvers and include 2D representations of three-dimensional (3D) vocal tract-like geometries. However, until now, only the acoustics of straight 3D tubes with circular cross-sections have been successfully replicated with this approach. Furthermore, the simulation of the resulting 2D shapes requires extremely high spatio-temporal resolutions, dramatically reducing the speed boost deriving from the usage of a 2D wave solver. In this paper, we introduce an in-progress novel vocal tract model that extends the 2D Finite-Difference Time-Domain wave solver (2.5D FDTD) by adding tube depth, derived from the area functions, to the acoustic solver. The model combines the speed of a light 2D numerical scheme with the ability to natively simulate 3D tubes that are symmetric in one dimension, hence relaxing previous resolution requirements. An implementation of the 2.5D FDTD is presented, along with evaluation of its performance in the case of static vowel modeling. The paper discusses the current features and limits of the approach, and the potential impact on computational acoustics applications.

LGJun 27, 2019

Variational Shape Completion for Virtual Planning of Jaw Reconstructive Surgery

Amir H. Abdi, Mehran Pesteie, Eitan Prisman et al.

The premorbid geometry of the mandible is of significant relevance in jaw reconstructive surgeries and occasionally unknown to the surgical team. In this paper, an optimization framework is introduced to train deep models for completion (reconstruction) of the missing segments of the bone based on the remaining healthy structure. To leverage the contextual information of the surroundings of the dissected region, the voxel-weighted Dice loss is introduced. To address the non-deterministic nature of the shape completion problem, we leverage a weighted multi-target probabilistic solution which is an extension to the conditional variational autoencoder (CVAE). This approach considers multiple targets as acceptable reconstructions, each weighted according to their conformity with the original shape. We quantify the performance gain of the proposed method against similar algorithms, including CVAE, where we report statistically significant improvements in both deterministic and probabilistic paradigms. The probabilistic model is also evaluated on its ability to generate anatomically relevant variations for the missing bone. As a unique aspect of this work, the model is tested on real surgical cases where the clinical relevancy of its reconstructions and their compliance with surgeon's virtual plan are demonstrated as necessary steps towards clinical adoption.

LGApr 8, 2019

SPEAK YOUR MIND! Towards Imagined Speech Recognition With Hierarchical Deep Learning

Pramit Saha, Muhammad Abdul-Mageed, Sidney Fels

Speech-related Brain Computer Interface (BCI) technologies provide effective vocal communication strategies for controlling devices through speech commands interpreted from brain signals. In order to infer imagined speech from active thoughts, we propose a novel hierarchical deep learning BCI system for subject-independent classification of 11 speech tokens including phonemes and words. Our novel approach exploits predicted articulatory information of six phonological categories (e.g., nasal, bilabial) as an intermediate step for classifying the phonemes and words, thereby finding discriminative signal responsible for natural speech synthesis. The proposed network is composed of hierarchical combination of spatial and temporal CNN cascaded with a deep autoencoder. Our best models on the KARA database achieve an average accuracy of 83.42% across the six different binary phonological classification tasks, and 53.36% for the individual token identification task, significantly outperforming our baselines. Ultimately, our work suggests the possible existence of a brain imagery footprint for the underlying articulatory movement related to different sounds that can be used to aid imagined speech decoding.

LGApr 8, 2019

Deep Learning the EEG Manifold for Phonological Categorization from Active Thoughts

Pramit Saha, Muhammad Abdul-Mageed, Sidney Fels

Speech-related Brain Computer Interfaces (BCI) aim primarily at finding an alternative vocal communication pathway for people with speaking disabilities. As a step towards full decoding of imagined speech from active thoughts, we present a BCI system for subject-independent classification of phonological categories exploiting a novel deep learning based hierarchical feature extraction scheme. To better capture the complex representation of high-dimensional electroencephalography (EEG) data, we compute the joint variability of EEG electrodes into a channel cross-covariance matrix. We then extract the spatio-temporal information encoded within the matrix using a mixed deep neural network strategy. Our model framework is composed of a convolutional neural network (CNN), a long-short term network (LSTM), and a deep autoencoder. We train the individual networks hierarchically, feeding their combined outputs in a final gradient boosting classification step. Our best models achieve an average accuracy of 77.9% across five different binary classification tasks, providing a significant 22.5% improvement over previous methods. As we also show visually, our work demonstrates that the speech imagery EEG possesses significant discriminative information about the intended articulatory movements responsible for natural speech synthesis.

LGApr 8, 2019

Hierarchical Deep Feature Learning For Decoding Imagined Speech From EEG

Pramit Saha, Sidney Fels

We propose a mixed deep neural network strategy, incorporating parallel combination of Convolutional (CNN) and Recurrent Neural Networks (RNN), cascaded with deep autoencoders and fully connected layers towards automatic identification of imagined speech from EEG. Instead of utilizing raw EEG channel data, we compute the joint variability of the channels in the form of a covariance matrix that provide spatio-temporal representations of EEG. The networks are trained hierarchically and the extracted features are passed onto the next network hierarchy until the final classification. Using a publicly available EEG based speech imagery database we demonstrate around 23.45% improvement of accuracy over the baseline method. Our approach demonstrates the promise of a mixed DNN approach for complex spatial-temporal classification problems.

HCFeb 10, 2019

Human Computer Interaction Design for Mobile Devices Based on a Smart Healthcare Architecture

Pu Liu, Sidney Fels, Nicholas West et al.

Smart and IoT-enabled mobile devices have the potential to enhance healthcare services for both patients and healthcare providers. Human computer interaction design is key to realizing a useful and usable connection between the users and these smart healthcare technologies. Appropriate design of such devices enhances the usability, improves effective operation in an integrated healthcare system, and facilitates the collaboration and information sharing between patients, healthcare providers, and institutions. In this paper, the concept of smart healthcare is introduced, including its four-layer information architecture of sensing, communication, data integration, and application. Human Computer Interaction design principles for smart healthcare mobile devices are outlined, based on user-centered design. These include: ensuring safety, providing error-resistant displays and alarms, supporting the unique relationship between patients and healthcare providers, distinguishing end-user groups, accommodating legacy devices, guaranteeing low latency, allowing for personalization, and ensuring patient privacy. Results are synthesized in design suggestions ranging from personas, scenarios, workflow, and information architecture, to prototyping, testing and iterative development. Finally, future developments in smart healthcare and Human Computer Interaction design for mobile health devices are outlined.

SDNov 20, 2018

Sound-Stream II: Towards Real-Time Gesture Controlled Articulatory Sound Synthesis

Pramit Saha, Debasish Ray Mohapatra, Praneeth SV et al.

We present an interface involving four degrees-of-freedom (DOF) mechanical control of a two dimensional, mid-sagittal tongue through a biomechanical toolkit called ArtiSynth and a sound synthesis engine called JASS towards articulatory sound synthesis. As a demonstration of the project, the user will learn to produce a range of JASS vocal sounds, by varying the shape and position of the ArtiSynth tongue in 2D space through a set of four force-based sensors. In other words, the user will be able to physically play around with these four sensors, thereby virtually controlling the magnitude of four selected muscle excitations of the tongue to vary articulatory structure. This variation is computed in terms of Area Functions in ArtiSynth environment and communicated to the JASS based audio-synthesizer coupled with two-mass glottal excitation model to complete this end-to-end gesture-to-sound mapping.

SDNov 19, 2018

Limitations of Source-Filter Coupling In Phonation

Debasish Ray Mohapatra, Sidney Fels

The coupling of vocal fold (source) and vocal tract (filter) is one of the most critical factors in source-filter articulation theory. The traditional linear source-filter theory has been challenged by current research which clearly shows the impact of acoustic loading on the dynamic behavior of the vocal fold vibration as well as the variations in the glottal flow pulses shape. This paper outlines the underlying mechanism of source-filter interactions; demonstrates the design and working principles of coupling for the various existing vocal cord and vocal tract biomechanical models. For our study, we have considered self-oscillating lumped-element models of the acoustic source and computational models of the vocal tract as articulators. To understand the limitations of source-filter interactions which are associated with each of those models, we compare them concerning their mechanical design, acoustic and physiological characteristics and aerodynamic simulation.

LGSep 17, 2018

Muscle Excitation Estimation in Biomechanical Simulation Using NAF Reinforcement Learning

Amir H. Abdi, Pramit Saha, Praneeth Srungarapu et al.

Motor control is a set of time-varying muscle excitations which generate desired motions for a biomechanical system. Muscle excitations cannot be directly measured from live subjects. An alternative approach is to estimate muscle activations using inverse motion-driven simulation. In this article, we propose a deep reinforcement learning method to estimate the muscle excitations in simulated biomechanical systems. Here, we introduce a custom-made reward function which incentivizes faster point-to-point tracking of target motion. Moreover, we deploy two new techniques, namely, episode-based hard update and dual buffer experience replay, to avoid feedback training loops. The proposed method is tested in four simulated 2D and 3D environments with 6 to 24 axial muscles. The results show that the models were able to learn muscle excitations for given motions after nearly 100,000 simulated steps. Moreover, the root mean square error in point-to-point reaching of the target across experiments was less than 1% of the length of the domain of motion. Our reinforcement learning method is far from the conventional dynamic approaches as the muscle control is derived functionally by a set of distributed neurons. This can open paths for neural activity interpretation of this phenomenon.

SDJul 29, 2018

Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

Pramit Saha, Praneeth Srungarapu, Sidney Fels

Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.

SDDec 17, 2015

Spectral Study of the Vocal Tract in Vowel Synthesis: A Comparison between 1D and 3D Acoustic Analysis

Negar M. Harandi, Daniel Aalto, Antti Hannukainen et al.

A state-of-the-art 1D acoustic synthesizer has been previously developed, and coupled to speaker-specific biomechanical models of oropharynx in ArtiSynth. As expected, the formant frequencies of the synthesized vowel sounds were shown to be different from those of the recorded audio. Such discrepancy was hypothesized to be due to the simplified geometry of the vocal tract model as well as the one dimensional implementation of Navier-Stokes equations. In this paper, we calculate Helmholtz resonances of our vocal tract geometries using 3D finite element method (FEM), and compare them with the formant frequencies obtained from the 1D method and audio. We hope such comparison helps with clarifying the limitations of our current models and/or speech synthesizer.