Xiyao Ma

h-index9

6papers

1,305citations

Novelty56%

AI Score39

Ranked #82,032 of 194,257 authors (top 42%)#15,556 in CL (top 51%)

6 Papers

8.2CLMar 11, 2024Code

MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning

Yichuan Li, Xiyao Ma, Sixing Lu et al.

Large Language models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities, where a LLM makes predictions for a given test input together with a few input-output pairs (demonstrations). Nevertheless, the inclusion of demonstrations leads to a quadratic increase in the computational overhead of the self-attention mechanism. Existing solutions attempt to distill lengthy demonstrations into compact vectors. However, they often require task-specific retraining or compromise LLM's in-context learning performance. To mitigate these challenges, we present Meta dEmonstratioN Distillation (MEND), where a language model learns to distill any lengthy demonstrations into vectors without retraining for a new downstream task. We exploit the knowledge distillation to enhance alignment between MEND and LLM, achieving both efficiency and effectiveness simultaneously. MEND is endowed with the meta-knowledge of distilling demonstrations through a two-stage training process, which includes meta-distillation pretraining and fine-tuning. Comprehensive evaluations across seven diverse ICL task partitions using decoder-only (GPT-2) and encoder-decoder (T5) attest to MEND's prowess. It not only matches but often outperforms the Vanilla ICL as well as other state-of-the-art distillation models, while significantly reducing the computational demands. This innovation promises enhanced scalability and efficiency for the practical deployment of large language models

26.6LGSep 17, 2020

Distilled One-Shot Federated Learning

Yanlin Zhou, George Pu, Xiyao Ma et al.

Current federated learning algorithms take tens of communication rounds transmitting unwieldy model weights under ideal circumstances and hundreds when data is poorly distributed. Inspired by recent work on dataset distillation and distributed one-shot learning, we propose Distilled One-Shot Federated Learning (DOSFL) to significantly reduce the communication cost while achieving comparable performance. In just one round, each client distills their private dataset, sends the synthetic data (e.g. images or sentences) to the server, and collectively trains a global model. The distilled data look like noise and are only useful to the specific model weights, i.e., become useless after the model updates. With this weight-less and gradient-less design, the total communication cost of DOSFL is up to three orders of magnitude less than FedAvg while preserving between 93% to 99% performance of a centralized counterpart. Afterwards, clients could switch to traditional methods such as FedAvg to finetune the last few percent to fit personalized local models with local datasets. Through comprehensive experiments, we show the accuracy and communication performance of DOSFL on both vision and language tasks with different models including CNN, LSTM, Transformer, etc. We demonstrate that an eavesdropping attacker cannot properly train a good model using the leaked distilled data, without knowing the initial model weights. DOSFL serves as an inexpensive method to quickly converge on a performant pre-trained model with less than 0.1% communication cost of traditional methods.

0.3CLSep 16, 2020Code

Asking Complex Questions with Multi-hop Answer-focused Reasoning

Xiyao Ma, Qile Zhu, Yanlin Zhou et al.

Asking questions from natural language text has attracted increasing attention recently, and several schemes have been proposed with promising results by asking the right question words and copy relevant words from the input to the question. However, most state-of-the-art methods focus on asking simple questions involving single-hop relations. In this paper, we propose a new task called multihop question generation that asks complex and semantically relevant questions by additionally discovering and modeling the multiple entities and their semantic relations given a collection of documents and the corresponding answer 1. To solve the problem, we propose multi-hop answer-focused reasoning on the grounded answer-centric entity graph to include different granularity levels of semantic information including the word-level and document-level semantics of the entities and their semantic relations. Through extensive experiments on the HOTPOTQA dataset, we demonstrate the superiority and effectiveness of our proposed model that serves as a baseline to motivate future work.

53.0LGApr 27, 2020Code

A Batch Normalized Inference Network Keeps the KL Vanishing Away

Qile Zhu, Jianlin Su, Wei Bi et al.

Variational Autoencoder (VAE) is widely used as a generative model to approximate a model's posterior on latent variables by combining the amortized variational inference and deep neural networks. However, when paired with strong autoregressive decoders, VAE often converges to a degenerated local optimum known as "posterior collapse". Previous approaches consider the Kullback Leibler divergence (KL) individual for each datapoint. We propose to let the KL follow a distribution across the whole dataset, and analyze that it is sufficient to prevent posterior collapse by keeping the expectation of the KL's distribution positive. Then we propose Batch Normalized-VAE (BN-VAE), a simple but effective approach to set a lower bound of the expectation by regularizing the distribution of the approximate posterior's parameters. Without introducing any new model component or modifying the objective, our approach can avoid the posterior collapse effectively and efficiently. We further show that the proposed BN-VAE can be extended to conditional VAE (CVAE). Empirically, our approach surpasses strong autoregressive baselines on language modeling, text classification and dialogue generation, and rivals more complex approaches while keeping almost the same training time as VAE.

2.6CLDec 2, 2019

Improving Question Generation with Sentence-level Semantic Matching and Answer Position Inferring

Xiyao Ma, Qile Zhu, Yanlin Zhou et al.

Taking an answer and its context as input, sequence-to-sequence models have made considerable progress on question generation. However, we observe that these approaches often generate wrong question words or keywords and copy answer-irrelevant words from the input. We believe that lacking global question semantics and exploiting answer position-awareness not well are the key root causes. In this paper, we propose a neural question generation model with two concrete modules: sentence-level semantic matching and answer position inferring. Further, we enhance the initial state of the decoder by leveraging the answer-aware gated fusion mechanism. Experimental results demonstrate that our model outperforms the state-of-the-art (SOTA) models on SQuAD and MARCO datasets. Owing to its generality, our work also improves the existing models significantly.

7.3RONov 15, 2019

Adaptive Leader-Follower Formation Control and Obstacle Avoidance via Deep Reinforcement Learning

Yanlin Zhou, Fan Lu, George Pu et al.

We propose a deep reinforcement learning (DRL) methodology for the tracking, obstacle avoidance, and formation control of nonholonomic robots. By separating vision-based control into a perception module and a controller module, we can train a DRL agent without sophisticated physics or 3D modeling. In addition, the modular framework averts daunting retrains of an image-to-action end-to-end neural network, and provides flexibility in transferring the controller to different robots. First, we train a convolutional neural network (CNN) to accurately localize in an indoor setting with dynamic foreground/background. Then, we design a new DRL algorithm named Momentum Policy Gradient (MPG) for continuous control tasks and prove its convergence. We also show that MPG is robust at tracking varying leader movements and can naturally be extended to problems of formation control. Leveraging reward shaping, features such as collision and obstacle avoidance can be easily integrated into a DRL controller.