CLApr 14, 2023
OpenAssistant Conversations -- Democratizing Large Language Model AlignmentAndreas Köpf, Yannic Kilcher, Dimitri von Rütte et al.
Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and has driven rapid adoption as demonstrated by ChatGPT. Alignment techniques such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) greatly reduce the required skill and domain knowledge to effectively harness the capabilities of LLMs, increasing their accessibility and utility across various domains. However, state-of-the-art alignment techniques like RLHF rely on high-quality human feedback data, which is expensive to create and often remains proprietary. In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations, a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages in 35 different languages, annotated with 461,292 quality ratings, resulting in over 10,000 complete and fully annotated conversation trees. The corpus is a product of a worldwide crowd-sourcing effort involving over 13,500 volunteers. Models trained on OpenAssistant Conversations show consistent improvements on standard benchmarks over respective base models. We release our code and data under a fully permissive licence.
CLMar 4, 2024Code
An Improved Traditional Chinese Evaluation Suite for Foundation ModelZhi-Rui Tam, Ya-Ting Pai, Yen-Wei Lee et al.
We present TMMLU+, a new benchmark designed for Traditional Chinese language understanding. TMMLU+ is a multi-choice question-answering dataset with 66 subjects from elementary to professional level. It is six times larger and boasts a more balanced subject distribution than its predecessor, Taiwan Massive Multitask Language Understanding (TMMLU). We also benchmark closed-source models and 26 open-weight Chinese large language models (LLMs) of parameters ranging from 1.8B to 72B on the proposed TMMLU+. Our findings reveal that (1.) Traditional Chinese models still trail behind their Simplified Chinese counterparts, highlighting a need for more focused advancements in LLMs catering to Traditional Chinese. (2.) Current LLMs still fall short of human performance in average scores, indicating a potential need for future research to delve deeper into social science and humanities subjects. (3.) Among all the tokenization compression metrics examined, we identify that only the fertility score uniquely demonstrates strong correlations with our benchmark results. We foresee that TMMLU+ will pinpoint areas for future model improvement, thereby narrowing the gap between machine and human linguistic capabilities and supporting researchers in developing Traditional Chinese LLMs. Our dataset, along with the benchmark source code, is accessible at huggingface.co/datasets/ikala/tmmluplus.
LGSep 6, 2021
Gradient Normalization for Generative Adversarial NetworksYi-Lun Wu, Hong-Han Shuai, Zhi-Rui Tam et al.
In this paper, we propose a novel normalization method called gradient normalization (GN) to tackle the training instability of Generative Adversarial Networks (GANs) caused by the sharp gradient space. Unlike existing work such as gradient penalty and spectral normalization, the proposed GN only imposes a hard 1-Lipschitz constraint on the discriminator function, which increases the capacity of the discriminator. Moreover, the proposed gradient normalization can be applied to different GAN architectures with little modification. Extensive experiments on four datasets show that GANs trained with gradient normalization outperform existing methods in terms of both Frechet Inception Distance and Inception Score.