SDJul 26, 2024
Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting ModelHyun Jin Park, Dhruuv Agarwal, Neng Chen et al.
This paper explores the use of TTS synthesized training data for KWS (keyword spotting) task while minimizing development cost and time. Keyword spotting models require a huge amount of training data to be accurate, and obtaining such training data can be costly. In the current state of the art, TTS models can generate large amounts of natural-sounding data, which can help reducing cost and time for KWS model development. Still, TTS generated data can be lacking diversity compared to real data. To pursue maximizing KWS model accuracy under the constraint of limited resources and current TTS capability, we explored various strategies to mix TTS data and real human speech data, with a focus on minimizing real data use and maximizing diversity of TTS output. Our experimental results indicate that relatively small amounts of real audio data with speaker diversity (100 speakers, 2k utterances) and large amounts of TTS synthesized data can achieve reasonably high accuracy (within 3x error rate of baseline), compared to the baseline (trained with 3.8M real positive utterances).
SDAug 20, 2024
Adversarial training of Keyword Spotting to Minimize TTS Data OverfittingHyun Jin Park, Dhruuv Agarwal, Neng Chen et al.
The keyword spotting (KWS) problem requires large amounts of real speech training data to achieve high accuracy across diverse populations. Utilizing large amounts of text-to-speech (TTS) synthesized data can reduce the cost and time associated with KWS development. However, TTS data may contain artifacts not present in real speech, which the KWS model can exploit (overfit), leading to degraded accuracy on real speech. To address this issue, we propose applying an adversarial training method to prevent the KWS model from learning TTS-specific features when trained on large amounts of TTS data. Experimental results demonstrate that KWS model accuracy on real speech data can be improved by up to 12% when adversarial loss is used in addition to the original KWS loss. Surprisingly, we also observed that the adversarial setup improves accuracy by up to 8%, even when trained solely on TTS and real negative speech data, without any real positive examples.
AIJul 7, 2025
MedGemma Technical ReportAndrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri et al.
Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.
HCOct 21, 2021
Towards Automatic Grading of D3.js VisualizationsMatthew Hull, Connor Guerin, Justin Chen et al.
Manually grading D3 data visualizations is a challenging endeavor, and is especially difficult for large classes with hundreds of students. Grading an interactive visualization requires a combination of interactive, quantitative, and qualitative evaluation that are conventionally done manually and are difficult to scale up as the visualization complexity, data size, and number of students increase. We present a first-of-its kind automatic grading method for D3 visualizations that scalably and precisely evaluates the data bindings, visual encodings, interactions, and design specifications used in a visualization. Our method has shown potential to enhance students' learning experience, enabling them to submit their code frequently and receive rapid feedback to better inform iteration and improvement to their code and visualization design. Our method promotes consistent grading and enables instructors to dedicate more focus to assist students in gaining visualization knowledge and experience. We have successfully deployed our method and auto-graded D3 submissions from more than 1000 undergraduate and graduate students in Georgia Tech's CSE6242 Data and Visual Analytics course, and received positive feedback and encouragement for expanding its adoption.
SEAug 8, 2020
FrUITeR: A Framework for Evaluating UI Test ReuseYixue Zhao, Justin Chen, Adriana Sejfia et al.
UI testing is tedious and time-consuming due to the manual effort required. Recent research has explored opportunities for reusing existing UI tests from an app to automatically generate new tests for other apps. However, the evaluation of such techniques currently remains manual, unscalable, and unreproducible, which can waste effort and impede progress in this emerging area. We introduce FrUITeR, a framework that automatically evaluates UI test reuse in a reproducible way. We apply FrUITeR to existing test-reuse techniques on a uniform benchmark we established, resulting in 11,917 test reuse cases from 20 apps. We report several key findings aimed at improving UI test reuse that are missed by existing work.
ACSep 5, 2019
Free resolutions of function classes via order complexesJustin Chen, Christopher Eur, Greg Yang et al.
Function classes are collections of Boolean functions on a finite set, which are fundamental objects of study in theoretical computer science. We study algebraic properties of ideals associated to function classes previously defined by the third author. We consider the broad family of intersection-closed function classes, and describe cellular free resolutions of their ideals by order complexes of the associated posets. For function classes arising from matroids, polyhedral cell complexes, and more generally interval Cohen-Macaulay posets, we show that the multigraded Betti numbers are pure, and are given combinatorially by the Möbius functions. We then apply our methods to derive bounds on the VC dimension of some important families of function classes in learning theory.
LGMay 7, 2019
CrossTrainer: Practical Domain Adaptation with Loss ReweightingJustin Chen, Edward Gan, Kexin Rong et al.
Domain adaptation provides a powerful set of model training techniques given domain-specific training data and supplemental data with unknown relevance. The techniques are useful when users need to develop models with data from varying sources, of varying quality, or from different time ranges. We build CrossTrainer, a system for practical domain adaptation. CrossTrainer utilizes loss reweighting, which provides consistently high model accuracy across a variety of datasets in our empirical analysis. However, loss reweighting is sensitive to the choice of a weight hyperparameter that is expensive to tune. We develop optimizations leveraging unique properties of loss reweighting that allow CrossTrainer to output accurate models while improving training time compared to naive hyperparameter search.
NEMay 17, 2016
Combinatorially Generated Piecewise Activation FunctionsJustin Chen
In the neuroevolution literature, research has primarily focused on evolving the number of nodes, connections, and weights in artificial neural networks. Few attempts have been made to evolve activation functions. Research in evolving activation functions has mainly focused on evolving function parameters, and developing heterogeneous networks by selecting from a fixed pool of activation functions. This paper introduces a novel technique for evolving heterogeneous artificial neural networks through combinatorially generating piecewise activation functions to enhance expressive power. I demonstrate this technique on NeuroEvolution of Augmenting Topologies using ArcTan and Sigmoid, and show that it outperforms the original algorithm on non-Markovian double pole balancing. This technique expands the landscape of unconventional activation functions by demonstrating that they are competitive with canonical choices, and introduces a purview for further exploration of automatic model selection for artificial neural networks.