CLJul 20, 2023
Layer-wise Representation Fusion for Compositional GeneralizationYafang Zheng, Lei Lin, Shuangtao Li et al.
Existing neural models are demonstrated to struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. A key reason for failure on CG is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled. However, previous work concentrates on separating the learning of syntax and semantics instead of exploring the reasons behind the representation entanglement (RE) problem to solve it. We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers. We find that the ``shallow'' residual connections within each layer fail to fuse previous layers' information effectively, leading to information forgetting between layers and further the RE problems. Inspired by this, we propose LRF, a novel \textbf{L}ayer-wise \textbf{R}epresentation \textbf{F}usion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process effectively through introducing a \emph{fuse-attention module} at each encoder and decoder layer. LRF achieves promising results on two realistic benchmarks, empirically demonstrating the effectiveness of our proposal.
CLMar 21, 2023
LEAPT: Learning Adaptive Prefix-to-prefix Translation For Simultaneous Machine TranslationLei Lin, Shuangtao Li, Xiaodong Shi
Simultaneous machine translation, which aims at a real-time translation, is useful in many live scenarios but very challenging due to the trade-off between accuracy and latency. To achieve the balance for both, the model needs to wait for appropriate streaming text (READ policy) and then generates its translation (WRITE policy). However, WRITE policies of previous work either are specific to the method itself due to the end-to-end training or suffer from the input mismatch between training and decoding for the non-end-to-end training. Therefore, it is essential to learn a generic and better WRITE policy for simultaneous machine translation. Inspired by strategies utilized by human interpreters and "wait" policies, we propose a novel adaptive prefix-to-prefix training policy called LEAPT, which allows our machine translation model to learn how to translate source sentence prefixes and make use of the future context. Experiments show that our proposed methods greatly outperform competitive baselines and achieve promising results.
CLMay 20, 2023Code
Learning to Compose Representations of Different Encoder Layers towards Improving Compositional GeneralizationLei Lin, Shuangtao Li, Yafang Zheng et al.
Recent studies have shown that sequence-to-sequence (seq2seq) models struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. There is mounting evidence that one of the reasons hindering CG is the representation of the encoder uppermost layer is entangled, i.e., the syntactic and semantic representations of sequences are entangled. However, we consider that the previously identified representation entanglement problem is not comprehensive enough. Additionally, we hypothesize that the source keys and values representations passing into different decoder layers are also entangled. Starting from this intuition, we propose \textsc{CompoSition} (\textbf{Compo}se \textbf{S}yntactic and Semant\textbf{i}c Representa\textbf{tion}s), an extension to seq2seq models which learns to compose representations of different encoder layers dynamically for different tasks, since recent studies reveal that the bottom layers of the Transformer encoder contain more syntactic information and the top ones contain more semantic information. Specifically, we introduce a \textit{composed layer} between the encoder and decoder to compose different encoder layers' representations to generate specific keys and values passing into different decoder layers. \textsc{CompoSition} achieves competitive results on two comprehensive and realistic benchmarks, which empirically demonstrates the effectiveness of our proposal. Codes are available at~\url{https://github.com/thinkaboutzero/COMPOSITION}.
AIJan 2, 2025
Enhancing Reasoning through Process Supervision with Monte Carlo Tree SearchShuangtao Li, Shuaihao Dong, Kexin Luan et al.
Large language models (LLMs) have demonstrated their remarkable capacity across a variety of tasks. However, reasoning remains a challenge for LLMs. To improve LLMs' reasoning ability, process supervision has proven to be better than outcome supervision. In this work, we study using Monte Carlo Tree Search (MCTS) to generate process supervision data with LLMs themselves for training them. We sample reasoning steps with an LLM and assign each step a score that captures its "relative correctness," and the LLM is then trained by minimizing weighted log-likelihood of generating the reasoning steps. This generate-then-train process is repeated iteratively until convergence.Our experimental results demonstrate that the proposed methods considerably improve the performance of LLMs on two mathematical reasoning datasets. Furthermore, models trained on one dataset also exhibit improved performance on the other, showing the transferability of the enhanced reasoning ability.
LGApr 20, 2018
Learning More Robust Features with Adversarial TrainingShuangtao Li, Yuanke Chen, Yanlin Peng et al.
In recent years, it has been found that neural networks can be easily fooled by adversarial examples, which is a potential safety hazard in some safety-critical applications. Many researchers have proposed various method to make neural networks more robust to white-box adversarial attacks, but an effective method have not been found so far. In this short paper, we focus on the robustness of the features learned by neural networks. We show that the features learned by neural networks are not robust, and find that the robustness of the learned features is closely related to the resistance against adversarial examples of neural networks. We also find that adversarial training against fast gradients sign method (FGSM) does not make the leaned features very robust, even if it can make the trained networks very resistant to FGSM attack. Then we propose a method, which can be seen as an extension of adversarial training, to train neural networks to learn more robust features. We perform experiments on MNIST and CIFAR-10 to evaluate our method, and the experiment results show that this method greatly improves the robustness of the learned features and the resistance to adversarial attacks.