CLApr 20, 2023
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive LearningHao Zhang, Nianwen Si, Yaqi Chen et al.
The end-to-end speech translation (E2E-ST) model has gradually become a mainstream paradigm due to its low latency and less error propagation. However, it is non-trivial to train such a model well due to the task complexity and data scarcity. The speech-and-text modality differences result in the E2E-ST model performance usually inferior to the corresponding machine translation (MT) model. Based on the above observation, existing methods often use sharingmechanisms to carry out implicit knowledge transfer by imposing various constraints. However, the final model often performs worse on the MT task than the MT model trained alone, which means that the knowledge transfer ability of this method is also limited. To deal with these problems, we propose the FCCL (Fine- and Coarse- Granularity Contrastive Learning) approach for E2E-ST, which makes explicit knowledge transfer through cross-modal multi-grained contrastive learning. A key ingredient of our approach is applying contrastive learning at both sentence- and frame-level to give the comprehensive guide for extracting speech representations containing rich semantic information.In addition, we adopt a simple whitening method to alleviate the representation degeneration in the MT model, which adversely affects contrast learning. Experiments on the MuST-C benchmark show that our proposed approach significantly outperforms the state-of-the-art E2E-ST baselines on all eight language pairs. Further analysis indicates that FCCL can free up its capacity from learning grammatical structure information and force more layers to learn semantic information.
CLApr 20, 2023
Decouple Non-parametric Knowledge Distillation For End-to-end Speech TranslationHao Zhang, Nianwen Si, Yaqi Chen et al.
Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training. However, transcriptions are not always available, and how to improve the ST model performance without transcription, i.e., data efficiency, has rarely been studied in the literature. In this paper, we propose Decoupled Non-parametric Knowledge Distillation (DNKD) from data perspective to improve the data efficiency. Our method follows the knowledge distillation paradigm. However, instead of obtaining the teacher distribution from a sophisticated MT model, we construct it from a non-parametric datastore via k-Nearest-Neighbor (kNN) retrieval, which removes the dependence on transcription and MT model. Then we decouple the classic knowledge distillation loss into target and non-target distillation to enhance the effect of the knowledge among non-target logits, which is the prominent "dark knowledge". Experiments on MuST-C corpus show that, the proposed method can achieve consistent improvement over the strong baseline without requiring any transcription.
CLOct 3, 2023
Tuning Large language model for End-to-end Speech TranslationHao Zhang, Nianwen Si, Yaqi Chen et al.
With the emergence of large language models (LLMs), multimodal models based on LLMs have demonstrated significant potential. Models such as LLaSM, X-LLM, and SpeechGPT exhibit an impressive ability to comprehend and generate human instructions. However, their performance often falters when faced with complex tasks like end-to-end speech translation (E2E-ST), a cross-language and cross-modal translation task. In comparison to single-modal models, multimodal models lag behind in these scenarios. This paper introduces LST, a Large multimodal model designed to excel at the E2E-ST task. LST consists of a speech frontend, an adapter, and a LLM backend. The training of LST consists of two stages: (1) Modality adjustment, where the adapter is tuned to align speech representation with text embedding space, and (2) Downstream task fine-tuning, where both the adapter and LLM model are trained to optimize performance on the E2EST task. Experimental results on the MuST-C speech translation benchmark demonstrate that LST-13B achieves BLEU scores of 30.39/41.55/35.33 on En-De/En-Fr/En-Es language pairs, surpassing previous models and establishing a new state-of-the-art. Additionally, we conduct an in-depth analysis of single-modal model selection and the impact of training strategies, which lays the foundation for future research. We will open up our code and models after review.
ITApr 28
On the Minimum Distances of Some Families of Goppa Codes and BCH CodesYaqi Chen, Hao Chen, Cunsheng Ding et al.
Goppa codes form an important class of alternant codes with wide applications in algebraic coding theory and code-based cryptography. Determining the true minimum distance of a Goppa code is a difficult problem. In this paper, we provide a necessary and sufficient criterion for a Goppa code to attain its designed distance $δ=t+1$, where $t$ is the degree of the Goppa polynomial. As applications, we determine the minimum distances of several classes of $q$-ary Goppa codes. In particular, we prove the tightness of the improved lower bound for a class of wild Goppa codes, and extend the family with $G(x)=x^t+A$ from the binary case to arbitrary odd prime powers. We then specialize the criterion to the monomial case $G(x)=x^t$, which is equivalent to primitive BCH codes. This leads to several infinite families of primitive BCH codes with $d=δ$, including the binary codes $\mathbf{C}_{(2,2^m-1,9,1)}$ and $\mathbf{C}_{(2,2^m-1,15,1)}$, the family $\mathbf{C}_{(p,p^p-1,2p+2,1)}$ with an odd prime $p$ and the family $\mathbf{C}_{(q,q^m-1,r\frac{q^m-1}{q-1}+1,1)}$ with $r\mid q-1$. In particular, we prove that the primitive BCH code $\mathbf{C}_{(q,q^m-1,q^t+1,1)}$ has minimum distance $q^t+1$ under the condition $t\mid m$, improving the previously known condition $pt\mid m$.
ITApr 26
On the Minimum Distances of Some Families of BCH CodesYaqi Chen, Hao Chen, Cunsheng Ding et al.
BCH codes form an important class of cyclic codes, which have applications in communication and data storage systems. Although the BCH bound provides a lower bound on the minimum distance of BCH codes, determining the true minimum distances of BCH codes is a very challenging problem. In this paper, we settle the minimum distances of a number of infinite families of narrow-sense BCH codes. By explicitly constructing the locator polynomials for minimum weight codewords, we obtain many families of primitive and non-primitive BCH codes with $d=δ$, where $d$ is the minimum distance of a $q$-ary BCH code of length $n$, designed distance $δ$, and offset $b$, denoted by $\mathbf{C}_{(q, n, δ, b)}$. For primitive BCH codes, we obtain infinite families of BCH codes over $\mathbb{F}_3$ and $\mathbb{F}_4$ satisfying $d=δ$, where $δ\in \{5,6,7,8\}$. Moreover, we construct several infinite families of $q$-ary BCH codes with $d=δ$, where $2 \le δ\le q-1$. For $δ=q^t+1$, we prove that the BCH code $\mathbf{C}_{(q, q^m-1, q^t+1, 1)}$ has $d=δ$ for all $m$ satisfying $m \equiv 0 \pmod{pt}$, where $p$ denotes the characteristic of $\mathbb{F}_q$. In the paper by Ding et al., IEEE Trans. Inf. Theory 61(5): 2351-2356, it was conjectured that the minimum distance of $\mathbf{C}_{(q, q^m-1, q^t+1, 1)}$ is always equal to its Bose distance $d_B$. Our result confirms this conjecture for the case $m \equiv 0 \pmod{pt}$. For non-primitive BCH codes, we construct a family of BCH codes $\mathbf{C}_{(q,\frac{q^p-1}λ,p+1,1)}$ with $d=δ=p+1$, where $p$ is an odd prime, $q=p^e$ with $p \nmid e$ and $λ\mid q-1$.
LGApr 1
A Cross-graph Tuning-free GNN Prompting FrameworkYaqi Chen, Shixun Huang, Ryan Twemlow et al.
GNN prompting aims to adapt models across tasks and graphs without requiring extensive retraining. However, most existing graph prompt methods still require task-specific parameter updates and face the issue of generalizing across graphs, limiting their performance and undermining the core promise of prompting. In this work, we introduce a Cross-graph Tuning-free Prompting Framework (CTP), which supports both homogeneous and heterogeneous graphs, can be directly deployed to unseen graphs without further parameter tuning, and thus enables a plug-and-play GNN inference engine. Extensive experiments on few-shot prediction tasks show that, compared to SOTAs, CTP achieves an average accuracy gain of 30.8% and a maximum gain of 54%, confirming its effectiveness and offering a new perspective on graph prompt learning.