CLNov 10, 2022
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language UnderstandingYifan Peng, Siddhant Arora, Yosuke Higuchi et al. · cmu, nvidia
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming. Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? To answer this question, we employ four types of pre-trained models and their combinations for SLU. We leverage self-supervised speech and language models (LM) pre-trained on large quantities of unpaired data to extract strong speech and text representations. We also explore using supervised models pre-trained on larger external automatic speech recognition (ASR) or SLU corpora. We conduct extensive experiments on the SLU Evaluation (SLUE) benchmark and observe self-supervised pre-trained models to be more powerful, with pre-trained LM and speech models being most beneficial for the Sentiment Analysis and Named Entity Recognition task, respectively.
CLNov 29, 2021Code
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnetSiddhant Arora, Siddharth Dalmia, Pavel Denisov et al.
As Automatic Speech Processing (ASR) systems are getting better, there is an increasing interest of using the ASR output to do downstream Natural Language Processing (NLP) tasks. However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks. Hence, there is a need to build an open source standard that can be used to have a faster start into SLU research. We present ESPnet-SLU, which is designed for quick development of spoken language understanding in a single framework. ESPnet-SLU is a project inside end-to-end speech processing toolkit, ESPnet, which is a widely used open-source standard for various speech processing tasks like ASR, Text to Speech (TTS) and Speech Translation (ST). We enhance the toolkit to provide implementations for various SLU benchmarks that enable researchers to seamlessly mix-and-match different ASR and NLU models. We also provide pretrained models with intensively tuned hyper-parameters that can match or even outperform the current state-of-the-art performances. The toolkit is publicly available at https://github.com/espnet/espnet.
ARAug 19, 2019Code
Boosting the Bounds of Symbolic QED for Effective Pre-Silicon Verification of Processor CoresKarthik Ganesan, Srinivasa Shashank Nuthakki
Existing techniques to ensure functional correctness and hardware trust during pre-silicon verification face severe limitations. In this work, we systematically leverage two key ideas: 1) Symbolic Quick Error Detection (Symbolic QED or SQED), a recent bug detection and localization technique using Bounded Model Checking (BMC); and 2) Symbolic starting states, to present a method that: i) Effectively detects both "difficult" logic bugs and Hardware Trojans, even with long activation sequences where traditional BMC techniques fail; and ii) Does not need skilled manual guidance for writing testbenches, writing design-specific assertions, or debugging spurious counter-examples. Using open-source RISC-V cores, we demonstrate the following: 1. Quick (<5 minutes for an in-order scalar core and <2.5 hours for an out-of-order superscalar core) detection of 100% of hundreds of logic bug and hardware Trojan scenarios from commercial chips and research literature, and 97.9% of "extremal" bugs (randomly-generated bugs requiring ~100,000 activation instructions taken from random test programs). 2. Quick (~1 minute) detection of several previously unknown bugs in open-source RISC-V designs.
93.8LGMay 3
ZeRO-Prefill: Zero Redundancy Overheads in MoE Prefill ServingZhaoyuan Su, Olatunji Ruwase, Karthik Ganesan et al.
Production LLM workloads increasingly serve discriminative tasks, such as classification, recommendation, and verification, whose answers are read from the logits of a single prefill pass with no autoregressive decoding. Serving these prefill-only workloads on mixture-of-experts (MoE) models is bottlenecked not by compute but by the distributed execution required to fit the model: existing parallel strategies (tensor, expert, and pipeline parallelism) trade memory pressure for redundant computation, communication, and synchronization, severely degrading MoE prefill serving efficiency. We observe that these overheads stem from coupling expert placement with synchronous activation routing -- a design inherited from the decoding era. The long, compute-bound forward passes of large-batch prefill open a per-layer window wide enough to stream expert weights in the background, replacing per-layer activation AllToAll with asynchronous weight AllGather fully overlapped with computation. We propose ZeRO-Prefill, a prefill-only serving system whose backend, AsyncEP (Asynchronous Expert Parallelism), gathers experts by weight rather than routing them by activation, and whose frontend co-enforces a physically-derived saturation threshold through prefix-aware routing and true-FLOPs load tracking. On Qwen3-235B-A22B across four hardware/precision configurations, ZeRO-Prefill delivers 1.35-1.37x throughput over the strongest distributed baseline on real-world workloads and up to 1.59x on long-context synthetic workloads, sustaining 29.8-36.2% per-GPU model FLOPs utilization.
CLJun 11, 2021
N-Best ASR Transformer: Enhancing SLU Performance using Multiple ASR HypothesesKarthik Ganesan, Pakhi Bamdev, Jaivarsan B et al.
Spoken Language Understanding (SLU) systems parse speech into semantic structures like dialog acts and slots. This involves the use of an Automatic Speech Recognizer (ASR) to transcribe speech into multiple text alternatives (hypotheses). Transcription errors, common in ASRs, impact downstream SLU performance negatively. Approaches to mitigate such errors involve using richer information from the ASR, either in form of N-best hypotheses or word-lattices. We hypothesize that transformer models learn better with a simpler utterance representation using the concatenation of the N-best ASR alternatives, where each alternative is separated by a special delimiter [SEP]. In our work, we test our hypothesis by using concatenated N-best ASR alternatives as the input to transformer encoder models, namely BERT and XLM-RoBERTa, and achieve performance equivalent to the prior state-of-the-art model on DSTC2 dataset. We also show that our approach significantly outperforms the prior state-of-the-art when subjected to the low data regime. Additionally, this methodology is accessible to users of third-party ASR APIs which do not provide word-lattice information.
ARMay 9, 2015
TPAD: Hardware Trojan Prevention and Detection for Trusted Integrated CircuitsTony F. Wu, Karthik Ganesan, Yunqing Alexander Hu et al.
There are increasing concerns about possible malicious modifications of integrated circuits (ICs) used in critical applications. Such attacks are often referred to as hardware Trojans. While many techniques focus on hardware Trojan detection during IC testing, it is still possible for attacks to go undetected. Using a combination of new design techniques and new memory technologies, we present a new approach that detects a wide variety of hardware Trojans during IC testing and also during system operation in the field. Our approach can also prevent a wide variety of attacks during synthesis, place-and-route, and fabrication of ICs. It can be applied to any digital system, and can be tuned for both traditional and split-manufacturing methods. We demonstrate its applicability for both ASICs and FPGAs. Using fabricated test chips with Trojan emulation capabilities and also using simulations, we demonstrate: 1. The area and power costs of our approach can range between 7.4-165% and 0.07-60%, respectively, depending on the design and the attacks targeted; 2. The speed impact can be minimal (close to 0%); 3. Our approach can detect 99.998% of Trojans (emulated using test chips) that do not require detailed knowledge of the design being attacked; 4. Our approach can prevent 99.98% of specific attacks (simulated) that utilize detailed knowledge of the design being attacked (e.g., through reverse-engineering). 5. Our approach never produces any false positives, i.e., it does not report attacks when the IC operates correctly.