CLAug 9, 2023
Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense RetrievalTim Hartill, Diana Benavides-Prado, Michael Witbrock et al.
When provided with sufficient explanatory context, smaller Language Models have been shown to exhibit strong reasoning ability on challenging short-answer question-answering tasks where the questions are unseen in training. We evaluate two methods for further improvement in this setting. Both methods focus on combining rationales generated by a larger Language Model with longer contexts created from a multi-hop dense retrieval system. The first method ($\textit{RR}$) involves training a Rationale Ranking model to score both generated rationales and retrieved contexts with respect to relevance and truthfulness. We then use the scores to derive combined contexts from both knowledge sources using a number of combinatory strategies. For the second method ($\textit{RATD}$) we utilise retrieval-augmented training datasets developed by Hartill et al. 2023 to train a smaller Reasoning model such that it becomes proficient at utilising relevant information from longer text sequences that may be only partially evidential and frequently contain many irrelevant sentences. We find that both methods significantly improve results. Our single best Reasoning model materially improves upon strong comparable prior baselines for unseen evaluation datasets (StrategyQA 58.9 $\rightarrow$ 61.7 acc., CommonsenseQA 63.6 $\rightarrow$ 72.7 acc., ARC-DA 31.6 $\rightarrow$ 52.1 F1, IIRC 25.5 $\rightarrow$ 27.3 F1) and a version utilising our prior knowledge of each type of question in selecting a context combination strategy does even better. Our proposed models also generally outperform direct prompts against much larger models (BLOOM 175B and StableVicuna 13B) in both few-shot chain-of-thought and standard few-shot settings.
LGAug 14, 2022
A Theory for Knowledge Transfer in Continual LearningDiana Benavides-Prado, Patricia Riddle
Continual learning of a stream of tasks is an active area in deep neural networks. The main challenge investigated has been the phenomenon of catastrophic forgetting or interference of newly acquired knowledge with knowledge from previous tasks. Recent work has investigated forward knowledge transfer to new tasks. Backward transfer for improving knowledge gained during previous tasks has received much less attention. There is in general limited understanding of how knowledge transfer could aid tasks learned continually. We present a theory for knowledge transfer in continual supervised learning, which considers both forward and backward transfer. We aim at understanding their impact for increasingly knowledgeable learners. We derive error bounds for each of these transfer mechanisms. These bounds are agnostic to specific implementations (e.g. deep neural networks). We demonstrate that, for a continual learner that observes related tasks, both forward and backward transfer can contribute to an increasing performance as more tasks are observed.
CLJul 12, 2025Code
Psychology-Driven Enhancement of Humour TranslationYuchen Su, Yonghua Zhu, Yang Chen et al.
Humour translation plays a vital role as a bridge between different cultures, fostering understanding and communication. Although most existing Large Language Models (LLMs) are capable of general translation tasks, these models still struggle with humour translation, which is especially reflected through linguistic interference and lacking humour in translated text. In this paper, we propose a psychology-inspired Humour Decomposition Mechanism (HDM) that utilises Chain-of-Thought (CoT) to imitate the ability of the human thought process, stimulating LLMs to optimise the readability of translated humorous texts. Moreover, we integrate humour theory in HDM to further enhance the humorous elements in the translated text. Our automatic evaluation experiments on open-source humour datasets demonstrate that our method significantly improves the quality of humour translation, yielding average gains of 7.75\% in humour, 2.81\% in fluency, and 6.13\% in coherence of the generated text.
71.2SDMar 19
Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language ModelsYuchen Su, Shaoxin Zhong, Yonghua Zhu et al.
Puns represent a typical linguistic phenomenon that exploits polysemy and phonetic ambiguity to generate humour, posing unique challenges for natural language understanding. Within pun research, audio plays a central role in human communication except text and images, while datasets and systematic resources for spoken puns remain scarce, leaving this crucial modality largely underexplored. In this paper, we present APUN-Bench, the first benchmark dedicated to evaluating large audio language models (LALMs) on audio pun understanding. Our benchmark contains 4,434 audio samples annotated across three stages: pun recognition, pun word location and pun meaning inference. We conduct a deep analysis of APUN-Bench by systematically evaluating 10 state-of-the-art LALMs, uncovering substantial performance gaps in recognizing, localizing, and interpreting audio puns. This analysis reveals key challenges, such as positional biases in audio pun location and error cases in meaning inference, offering actionable insights for advancing humour-aware audio intelligence.
CLJul 7, 2025
A Survey of Pun Generation: Datasets, Evaluations and MethodologiesYuchen Su, Yonghua Zhu, Ruofan Wang et al.
Pun generation seeks to creatively modify linguistic elements in text to produce humour or evoke double meanings. It also aims to preserve coherence and contextual appropriateness, making it useful in creative writing and entertainment across various media and contexts. Although pun generation has received considerable attention in computational linguistics, there is currently no dedicated survey that systematically reviews this specific area. To bridge this gap, this paper provides a comprehensive review of pun generation datasets and methods across different stages, including conventional approaches, deep learning techniques, and pre-trained language models. Additionally, we summarise both automated and human evaluation metrics used to assess the quality of pun generation. Finally, we discuss the research challenges and propose promising directions for future work.
CLMay 5, 2023
Neuromodulation Gated TransformerKobe Knowles, Joshua Bensemann, Diana Benavides-Prado et al.
We introduce a novel architecture, the Neuromodulation Gated Transformer (NGT), which is a simple implementation of neuromodulation in transformers via a multiplicative effect. We compare it to baselines and show that it results in the best average performance on the SuperGLUE benchmark validation sets.