CLMar 16, 2022
CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual SignalsScott Novotney, Sreeparna Mukherjee, Zeeshan Ahmed et al. · amazon-science
We propose a framework to modularize the training of neural language models that use diverse forms of sentence-external context (including metadata) by eliminating the need to jointly train sentence-external and within-sentence encoders. Our approach, contextual universal embeddings (CUE), trains LMs on one set of context, such as date and author, and adapts to novel metadata types, such as article title, or previous sentence. The model consists of a pretrained neural sentence LM, a BERT-based context encoder, and a masked transformer decoder that estimates LM probabilities using sentence-internal and sentence-external information. When context or metadata are unavailable, our model learns to combine contextual and sentence-internal information using noisy oracle unigram embeddings as a proxy. Real contextual information can be introduced later and used to adapt a small number of parameters that map contextual data into the decoder's embedding space. We validate the CUE framework on a NYTimes text corpus with multiple metadata types, for which the LM perplexity can be lowered from 36.6 to 27.4 by conditioning on context. Bootstrapping a contextual LM with only a subset of the context/metadata during training retains 85\% of the achievable gain. Training the model initially with proxy context retains 67% of the perplexity gain after adapting to real context. Furthermore, we can swap one type of pretrained sentence LM for another without retraining the context encoders, by only adapting the decoder model. Overall, we obtain a modular framework that allows incremental, scalable training of context-enhanced LMs.
CLSep 12, 2023
Recovering from Privacy-Preserving Masking with Large Language ModelsArpita Vats, Zhe Liu, Peng Su et al.
Model adaptation is crucial to handle the discrepancy between proxy training data and actual users data received. To effectively perform adaptation, textual data of users is typically stored on servers or their local devices, where downstream natural language processing (NLP) models can be directly trained using such in-domain data. However, this might raise privacy and security concerns due to the extra risks of exposing user information to adversaries. Replacing identifying information in textual data with a generic marker has been recently explored. In this work, we leverage large language models (LLMs) to suggest substitutes of masked tokens and have their effectiveness evaluated on downstream language modeling tasks. Specifically, we propose multiple pre-trained and fine-tuned LLM-based approaches and perform empirical studies on various datasets for the comparison of these methods. Experimental results show that models trained on the obfuscation corpora are able to achieve comparable performance with the ones trained on the original data without privacy-preserving token masking.
STDec 8, 2022
Nostradamus: Weathering WorthAlapan Chaudhuri, Zeeshan Ahmed, Ashwin Rao et al.
Nostradamus, inspired by the French astrologer and reputed seer, is a detailed study exploring relations between environmental factors and changes in the stock market. In this paper, we analyze associative correlation and causation between environmental elements (including natural disasters, climate and weather conditions) and stock prices, using historical stock market data, historical climate data, and various climate indicators such as carbon dioxide emissions. We have conducted our study based on the US financial market, global climate trends, and daily weather records to demonstrate a significant relationship between climate and stock price fluctuation. Our analysis covers both short-term and long-term rises and dips in company stock performances. Lastly, we take four natural disasters as a case study to observe the effect they have on people's emotional state and their influence on the stock market.
INS-DETSep 14, 2024
Evaluating probabilistic and data-driven inference models for fiber-coupled NV-diamond temperature sensorsShraddha Rajpal, Zeeshan Ahmed, Tyrus Berry
We evaluate the impact of inference model on uncertainties when using continuous wave Optically Detected Magnetic Resonance (ODMR) measurements to infer temperature. Our approach leverages a probabilistic feedforward inference model designed to maximize the likelihood of observed ODMR spectra through automatic differentiation. This model effectively utilizes the temperature dependence of spin Hamiltonian parameters to infer temperature from spectral features in the ODMR data. We achieve prediction uncertainty of $\pm$ 1 K across a temperature range of 243 K to 323 K. To benchmark our probabilistic model, we compare it with a non-parametric peak-finding technique and data-driven methodologies such as Principal Component Regression (PCR) and a 1D Convolutional Neural Network (CNN). We find that when validated against out-of-sample dataset that encompasses the same temperature range as the training dataset, data driven methods can show uncertainties that are as much as 0.67 K lower without incorporating expert-level understanding of the spectroscopic-temperature relationship. However, our results show that the probabilistic model outperforms both PCR and CNN when tasked with extrapolating beyond the temperature range used in training set, indicating robustness and generalizability. In contrast, data-driven methods like PCR and CNN demonstrate up to ten times worse uncertainties when tasked with extrapolating outside their training data range.
CLMar 28, 2025
Non-Monotonic Attention-based Read/Write Policy Learning for Simultaneous TranslationZeeshan Ahmed, Frank Seide, Zhe Liu et al.
Simultaneous or streaming machine translation generates translation while reading the input stream. These systems face a quality/latency trade-off, aiming to achieve high translation quality similar to non-streaming models with minimal latency. We propose an approach that efficiently manages this trade-off. By enhancing a pretrained non-streaming model, which was trained with a seq2seq mechanism and represents the upper bound in quality, we convert it into a streaming model by utilizing the alignment between source and target tokens. This alignment is used to learn a read/write decision boundary for reliable translation generation with minimal input. During training, the model learns the decision boundary through a read/write policy module, employing supervised learning on the alignment points (pseudo labels). The read/write policy module, a small binary classification unit, can control the quality/latency trade-off during inference. Experimental results show that our model outperforms several strong baselines and narrows the gap with the non-streaming baseline model.
ASDec 19, 2024
Transcribing and Translating, Fast and Slow: Joint Speech Translation and RecognitionNiko Moritz, Ruiming Xie, Yashesh Gaur et al.
We propose the joint speech translation and recognition (JSTAR) model that leverages the fast-slow cascaded encoder architecture for simultaneous end-to-end automatic speech recognition (ASR) and speech translation (ST). The model is transducer-based and uses a multi-objective training strategy that optimizes both ASR and ST objectives simultaneously. This allows JSTAR to produce high-quality streaming ASR and ST results. We apply JSTAR in a bilingual conversational speech setting with smart-glasses, where the model is also trained to distinguish speech from different directions corresponding to the wearer and a conversational partner. Different model pre-training strategies are studied to further improve results, including training of a transducer-based streaming machine translation (MT) model for the first time and applying it for parameter initialization of JSTAR. We demonstrate superior performances of JSTAR compared to a strong cascaded ST model in both BLEU scores and latency.
CLAug 18, 2025
Overcoming Latency Bottlenecks in On-Device Speech Translation: A Cascaded Approach with Alignment-Based Streaming MTZeeshan Ahmed, Frank Seide, Niko Moritz et al.
This paper tackles several challenges that arise when integrating Automatic Speech Recognition (ASR) and Machine Translation (MT) for real-time, on-device streaming speech translation. Although state-of-the-art ASR systems based on Recurrent Neural Network Transducers (RNN-T) can perform real-time transcription, achieving streaming translation in real-time remains a significant challenge. To address this issue, we propose a simultaneous translation approach that effectively balances translation quality and latency. We also investigate efficient integration of ASR and MT, leveraging linguistic cues generated by the ASR system to manage context and utilizing efficient beam-search pruning techniques such as time-out and forced finalization to maintain system's real-time factor. We apply our approach to an on-device bilingual conversational speech translation and demonstrate that our techniques outperform baselines in terms of latency and quality. Notably, our technique narrows the quality gap with non-streaming translation systems, paving the way for more accurate and efficient real-time speech translation.
CLSep 1, 2023
Contextual Biasing of Named-Entities with Large Language ModelsChuanneng Sun, Zeeshan Ahmed, Yingyi Ma et al.
This paper studies contextual biasing with Large Language Models (LLMs), where during second-pass rescoring additional contextual information is provided to a LLM to boost Automatic Speech Recognition (ASR) performance. We propose to leverage prompts for a LLM without fine tuning during rescoring which incorporate a biasing list and few-shot examples to serve as additional information when calculating the score for the hypothesis. In addition to few-shot prompt learning, we propose multi-task training of the LLM to predict both the entity class and the next token. To improve the efficiency for contextual biasing and to avoid exceeding LLMs' maximum sequence lengths, we propose dynamic prompting, where we select the most likely class using the class tag prediction, and only use entities in this class as contexts for next token prediction. Word Error Rate (WER) evaluation is performed on i) an internal calling, messaging, and dictation dataset, and ii) the SLUE-Voxpopuli dataset. Results indicate that biasing lists and few-shot examples can achieve 17.8% and 9.6% relative improvement compared to first pass ASR, and that multi-task training and dynamic prompting can achieve 20.0% and 11.3% relative WER improvement, respectively.
ASFeb 17, 2022
Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech RecognitionChao-Han Huck Yang, Zeeshan Ahmed, Yile Gu et al.
In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples. We focus on a rigorous and empirical "closed-model adversarial robustness" setting (e.g., on-device or cloud applications). The adversarial noise is only generated by closed-model optimization (e.g., evolutionary and zeroth-order estimation) without accessing gradient information of a targeted ASR model directly. We propose an advanced Bayesian neural network (BNN) based adversarial detector, which could model latent distributions against adaptive adversarial perturbation with divergence measurement. We further simulate deployment scenarios of RNN Transducer, Conformer, and wav2vec-2.0 based ASR systems with the proposed adversarial detection system. Leveraging the proposed BNN based detection system, we improve detection rate by +2.77 to +5.42% (relative +3.03 to +6.26%) and reduce the word error rate by 5.02 to 7.47% on LibriSpeech datasets compared to the current model enhancement methods against the adversarial speech examples.
CRJan 21, 2022
Blockchain-based Collaborated Federated Learning for Improved Security, Privacy and ReliabilityAmir Afaq, Zeeshan Ahmed, Noman Haider et al.
Federated Learning (FL) provides privacy preservation by allowing the model training at edge devices without the need of sending the data from edge to a centralized server. FL has distributed the implementation of ML. Another variant of FL which is well suited for the Internet of Things (IoT) is known as Collaborated Federated Learning (CFL), which does not require an edge device to have a direct link to the model aggregator. Instead, the devices can connect to the central model aggregator via other devices using them as relays. Although, FL and CFL protect the privacy of edge devices but raises security challenges for a centralized server that performs model aggregation. The centralized server is prone to malfunction, backdoor attacks, model corruption, adversarial attacks and external attacks. Moreover, edge device to centralized server data exchange is not required in FL and CFL, but model parameters are sent from the model aggregator (global model) to edge devices (local model), which is still prone to cyber-attacks. These security and privacy concerns can be potentially addressed by Blockchain technology. The blockchain is a decentralized and consensus-based chain where devices can share consensus ledgers with increased reliability and security, thus significantly reducing the cyberattacks on an exchange of information. In this work, we will investigate the efficacy of blockchain-based decentralized exchange of model parameters and relevant information among edge devices and from a centralized server to edge devices. Moreover, we will be conducting the feasibility analysis for blockchain-based CFL models for different application scenarios like the internet of vehicles, and the internet of things. The proposed study aims to improve the security, reliability and privacy preservation by the use of blockchain-powered CFL.
LGJun 14, 2021
CathAI: Fully Automated Interpretation of Coronary Angiograms Using Neural NetworksRobert Avram, Jeffrey E. Olgin, Alvin Wan et al.
Coronary heart disease (CHD) is the leading cause of adult death in the United States and worldwide, and for which the coronary angiography procedure is the primary gateway for diagnosis and clinical management decisions. The standard-of-care for interpretation of coronary angiograms depends upon ad-hoc visual assessment by the physician operator. However, ad-hoc visual interpretation of angiograms is poorly reproducible, highly variable and bias prone. Here we show for the first time that fully-automated angiogram interpretation to estimate coronary artery stenosis is possible using a sequence of deep neural network algorithms. The algorithmic pipeline we developed--called CathAI--achieves state-of-the art performance across the sequence of tasks required to accomplish automated interpretation of unselected, real-world angiograms. CathAI (Algorithms 1-2) demonstrated positive predictive value, sensitivity and F1 score of >=90% to identify the projection angle overall and >=93% for left or right coronary artery angiogram detection, the primary anatomic structures of interest. To predict obstructive coronary artery stenosis (>=70% stenosis), CathAI (Algorithm 4) exhibited an area under the receiver operating characteristic curve (AUC) of 0.862 (95% CI: 0.843-0.880). When externally validated in a healthcare system in another country, CathAI AUC was 0.869 (95% CI: 0.830-0.907) to predict obstructive coronary artery stenosis. Our results demonstrate that multiple purpose-built neural networks can function in sequence to accomplish the complex series of tasks required for automated analysis of real-world angiograms. Deployment of CathAI may serve to increase standardization and reproducibility in coronary stenosis assessment, while providing a robust foundation to accomplish future tasks for algorithmic angiographic interpretation.
CRJan 1, 2021
PHOENIX: Device-Centric Cellular Network Protocol Monitoring using Runtime VerificationMitziu Echeverria, Zeeshan Ahmed, Bincheng Wang et al.
End-user-devices in the current cellular ecosystem are prone to many different vulnerabilities across different generations and protocol layers. Fixing these vulnerabilities retrospectively can be expensive, challenging, or just infeasible. A pragmatic approach for dealing with such a diverse set of vulnerabilities would be to identify attack attempts at runtime on the device side, and thwart them with mitigating and corrective actions. Towards this goal, in the paper we propose a general and extendable approach called Phoenix for identifying n-day cellular network control-plane vulnerabilities as well as dangerous practices of network operators from the device vantage point. Phoenix monitors the device-side cellular network traffic for performing signature-based unexpected behavior detection through lightweight runtime verification techniques. Signatures in Phoenix can be manually-crafted by a cellular network security expert or can be automatically synthesized using an optional component of Phoenix, which reduces the signature synthesis problem to the language learning from the informant problem. Based on the corrective actions that are available to Phoenix when an undesired behavior is detected, different instantiations of Phoenix are possible: a full-fledged defense when deployed inside a baseband processor; a user warning system when deployed as a mobile application; a probe for identifying attacks in the wild. One such instantiation of Phoenix was able to identify all 15 representative n-day vulnerabilities and unsafe practices of 4G LTE networks considered in our evaluation with a high packet processing speed (~68000 packets/second) while inducing only a moderate amount of energy overhead (~4mW).
LGMay 14, 2019
Machine Learning at Microsoft with ML .NETZeeshan Ahmed, Saeed Amizadeh, Mikhail Bilenko et al.
Machine Learning is transitioning from an art and science into a technology available to every developer. In the near future, every application on every platform will incorporate trained models to encode data-based decisions that would be impossible for developers to author. This presents a significant engineering challenge, since currently data science and modeling are largely decoupled from standard software development processes. This separation makes incorporating machine learning capabilities inside applications unnecessarily costly and difficult, and furthermore discourage developers from embracing ML in first place. In this paper we present ML .NET, a framework developed at Microsoft over the last decade in response to the challenge of making it easy to ship machine learning models in large software applications. We present its architecture, and illuminate the application demands that shaped it. Specifically, we introduce DataView, the core data abstraction of ML .NET which allows it to capture full predictive pipelines efficiently and consistently across training and inference lifecycles. We close the paper with a surprisingly favorable performance study of ML .NET compared to more recent entrants, and a discussion of some lessons learned.
CVFeb 26, 2013
Image-based Face Detection and Recognition: "State of the Art"Faizan Ahmad, Aaima Najam, Zeeshan Ahmed
Face recognition from image or video is a popular topic in biometrics research. Many public places usually have surveillance cameras for video capture and these cameras have their significant value for security purpose. It is widely acknowledged that the face recognition have played an important role in surveillance system as it doesn't need the object's cooperation. The actual advantages of face based identification over other biometrics are uniqueness and acceptance. As human face is a dynamic object having high degree of variability in its appearance, that makes face detection a difficult problem in computer vision. In this field, accuracy and speed of identification is a main issue. The goal of this paper is to evaluate various face detection and recognition methods, provide complete solution for image based face detection and recognition with higher accuracy, better response rate as an initial step for video surveillance. Solution is proposed based on performed tests on various face rich databases in terms of subjects, pose, emotions, race and light.