Louis-S. Bouchard

SP
5papers
128citations
Novelty56%
AI Score42

5 Papers

GNSep 20, 2023
Embed-Search-Align: DNA Sequence Alignment using Transformer Models

Pavan Holur, K. C. Enevoldsen, Shreyas Rajesh et al.

DNA sequence alignment involves assigning short DNA reads to the most probable locations on an extensive reference genome. This process is crucial for various genomic analyses, including variant calling, transcriptomics, and epigenomics. Conventional methods, refined over decades, tackle this challenge in 2 steps: genome indexing followed by efficient search to locate likely positions for given reads. Building on the success of Large Language Models in encoding text into embeddings, where the distance metric captures semantic similarity, recent efforts have explored whether the same Transformer architecture can produce embeddings for DNA sequences. Such models have shown early promise in classifying short DNA sequences, such as detecting coding/non-coding regions, and enhancer, promoter sequences. However, performance at sequence classification tasks does not translate to sequence alignment, where it is necessary to search across the genome to align each read, a significantly longer-range task. We bridge this gap by framing the Sequence Alignment task for Transformer models as an "Embed-Search-Align" task. In this framework, a novel Reference-Free DNA Embedding model generates embeddings of reads and reference fragments, which are projected into a shared vector space where the read-fragment distance is used as a surrogate for alignment. Technical contributions include: (1) Contrastive loss for self-supervised training of DNA sequence representations, facilitating rich reference-free, sequence-level embeddings, and (2) a DNA vector store to enable search across fragments on a global scale. DNA-ESA is 99% accurate when aligning 250-length reads onto a human genome (3gb), rivaling conventional methods such as Bowtie and BWA-Mem. DNA-ESA exceeds the performance of 6 Transformer model baselines such as Nucleotide Transformer, Hyena-DNA, and shows task transfer across chromosomes and species.

QUANT-PHJun 23, 2022
Quantum Approximation of Normalized Schatten Norms and Applications to Learning

Yiyou Chen, Hideyuki Miyahara, Louis-S. Bouchard et al.

Efficient measures to determine similarity of quantum states, such as the fidelity metric, have been widely studied. In this paper, we address the problem of defining a similarity measure for quantum operations that can be \textit{efficiently estimated}. Given two quantum operations, $U_1$ and $U_2$, represented in their circuit forms, we first develop a quantum sampling circuit to estimate the normalized Schatten 2-norm of their difference ($\| U_1-U_2 \|_{S_2}$) with precision $ε$, using only one clean qubit and one classical random variable. We prove a Poly$(\frac{1}ε)$ upper bound on the sample complexity, which is independent of the size of the quantum system. We then show that such a similarity metric is directly related to a functional definition of similarity of unitary operations using the conventional fidelity metric of quantum states ($F$): If $\| U_1-U_2 \|_{S_2}$ is sufficiently small (e.g. $ \leq \fracε{1+\sqrt{2(1/δ- 1)}}$) then the fidelity of states obtained by processing the same randomly and uniformly picked pure state, $|ψ\rangle$, is as high as needed ($F({U}_1 |ψ\rangle, {U}_2 |ψ\rangle)\geq 1-ε$) with probability exceeding $1-δ$. We provide example applications of this efficient similarity metric estimation framework to quantum circuit learning tasks, such as finding the square root of a given unitary operation.

100.0OTHERApr 6
Weak Solutions to the Bloch Equations with Distant Dipolar Field

Louis-S. Bouchard

The distant dipolar field (DDF) is a long-range, nonlocal contribution to liquid-state spin dynamics that arises from intermolecular dipolar couplings and can generate multiple-quantum coherences and novel MRI contrast. Its sign-changing kernel makes Bloch-DDF dynamics strongly geometry dependent, and FFT-based dipolar convolutions naturally assume periodic or padded Cartesian domains rather than bounded samples with reflective diffusion boundaries. We study the Bloch equations with the DDF on bounded domains under homogeneous Neumann diffusion conditions. We derive a finite-element weak formulation that supports spatially varying diffusion and relaxation parameters and uses a short-distance regularization of the secular DDF kernel with length a>0. For fixed a we prove boundedness of the DDF operator, establish an L2 energy balance in which precession is neutral while diffusion and transverse relaxation are dissipative, and obtain local well-posedness with continuous dependence on the data, with global existence under energy-neutral transport. For the Galerkin semi-discretization we show a discrete energy identity mirroring the continuum estimate. For computation, we evaluate the DDF in real space with a matrix-free near/far scheme and advance in time using a second-order IMEX splitting method that treats diffusion and relaxation implicitly and precession explicitly. The explicit stage applies a Rodrigues rotation at DDF quadrature points followed by an L2 projection, enabling stable multi-cycle lab-frame simulations. We validate against three closed-form benchmarks and quantify curved-boundary effects by comparing mapped finite elements with a voxel-mask finite-difference baseline on spherical Neumann eigenmode decay. These results provide an analyzable and reproducible route for Bloch-DDF dynamics on bounded domains with complex geometry.

SPApr 25, 2021
Scalable End-to-End RF Classification: A Case Study on Undersized Dataset Regularization by Convolutional-MST

Khalid Youssef, Greg Schuette, Yubin Cai et al.

Unlike areas such as computer vision and speech recognition where convolutional and recurrent neural networks-based approaches have proven effective to the nature of the respective areas of application, deep learning (DL) still lacks a general approach suitable for the unique nature and challenges of RF systems such as radar, signals intelligence, electronic warfare, and communications. Existing approaches face problems in robustness, consistency, efficiency, repeatability and scalability. One of the main challenges in RF sensing such as radar target identification is the difficulty and cost of obtaining data. Hundreds to thousands of samples per class are typically used when training for classifying signals into 2 to 12 classes with reported accuracy ranging from 87% to 99%, where accuracy generally decreases with more classes added. In this paper, we present a new DL approach based on multistage training and demonstrate it on RF sensing signal classification. We consistently achieve over 99% accuracy for up to 17 diverse classes using only 11 samples per class for training, yielding up to 35% improvement in accuracy over standard DL approaches.

SPNov 5, 2017
Machine Learning Approach to RF Transmitter Identification

K. Youssef, Louis-S. Bouchard, K. Z. Haigh et al.

With the development and widespread use of wireless devices in recent years (mobile phones, Internet of Things, Wi-Fi), the electromagnetic spectrum has become extremely crowded. In order to counter security threats posed by rogue or unknown transmitters, it is important to identify RF transmitters not by the data content of the transmissions but based on the intrinsic physical characteristics of the transmitters. RF waveforms represent a particular challenge because of the extremely high data rates involved and the potentially large number of transmitters present in a given location. These factors outline the need for rapid fingerprinting and identification methods that go beyond the traditional hand-engineered approaches. In this study, we investigate the use of machine learning (ML) strategies to the classification and identification problems, and the use of wavelets to reduce the amount of data required. Four different ML strategies are evaluated: deep neural nets (DNN), convolutional neural nets (CNN), support vector machines (SVM), and multi-stage training (MST) using accelerated Levenberg-Marquardt (A-LM) updates. The A-LM MST method preconditioned by wavelets was by far the most accurate, achieving 100% classification accuracy of transmitters, as tested using data originating from 12 different transmitters. We discuss strategies for extension of MST to a much larger number of transmitters.