74.1PLMay 18
Contextual MetaML: Syntax and Full AbstractionHaoxuan Yin, Andrzej S. Murawski, C. -H. Luke Ong
MetaML-style metaprogramming languages allow programmers to construct, manipulate and run code. In the presence of higher-order references for code, ensuring type safety is challenging, as free variables can escape their binders. In this paper, we present Contextual MetaML, \textit{the first metaprogramming language that supports storing and running open code under a strong type safety guarantee}. The type system utilises contextual modal types to track and reason about free variables in code explicitly. A crucial concern in metaprogramming-based program optimisations is whether the optimised program preserves the meaning of the original program. Addressing this question requires a notion of program equivalence and techniques to reason about it. In this paper, we provide a semantic model that captures contextual equivalence for Contextual MetaML, establishing \textit{the first full abstraction result for an imperative MetaML-style language}. Our model is based on traces derived via operational game semantics, where the meaning of a program is modelled by its possible interactions with the environment. We also establish a novel closed instances of use theorem that accounts for both call-by-value and call-by-name closing substitutions.
33.2CLMar 18Code
KA2L: A Knowledge-Aware Active Learning Framework for LLMsHaoxuan Yin, Bojian Liu, Chen Tang et al.
Fine-tuning large language models (LLMs) with high-quality knowledge has been shown to enhance their performance effectively. However, there is a paucity of research on the depth of domain-specific knowledge comprehension by LLMs and the application of targeted active learning to improve their expertise. To address this gap, we introduce the Knowledge-Aware Active Learning (KA2L) framework. This framework assesses LLMs' mastery of specific knowledge points to aid in constructing unanswerable or unknowable questions through latent space analysis. This active learning strategy enhances training efficiency by focusing on knowledge the model has yet to master, thereby minimizing redundancy in learning already acquired information. This study innovatively employs a knowledge distribution probing technique to examine the hidden states of specific Transformer layers and identify the distribution of known and unknown knowledge within the LLM. Additionally, a hidden-state decoding method is proposed to generate numerous unknown questions in natural language from the latent knowledge space. In our experiments, we selected nine open-source LLMs to validate the effectiveness of the proposed framework. Results indicate that KA2L not only significantly reduces 50% annotation and computation costs across two open-domain and one vertical-domain dataset but also achieves better performance, offering valuable insights into active learning strategies for LLMs. The code is available at https://anonymous.4open.science/r/KA2L-F15C.