CLAug 5, 2024
Pula: Training Large Language Models for SetswanaNathan Brown, Vukosi Marivate
In this work we present Pula, a suite of bilingual language models proficient in both Setswana and English. Leveraging recent advancements in data availability and efficient fine-tuning, Pula 8B and Pula 14B outperform GPT-4o and Gemini 1.5 Pro on English-Setswana translation tasks and achieve state-of-the-art performance on Setswana reasoning tasks for their size. We release the weights for Pula 1B, 3B, 8B, and 14B as well as training logs and training and evaluation code. Alongside Pula, we release the largest-ever Setswana text corpus, Marothodi, and the first comprehensive Setswana instruction-tuning dataset, Medupi, consisting of reformatted datasets, translated corpora, and synthetic LLM-generated text. To accompany this data, we release the code used for dataset construction, formatting, filtering, and scraping. Last, we release two Setswana LLM-translated benchmarks, MMLU-tsn and GSM8K-tsn, to measure Setswana knowledge and reasoning capabilities.
CLNov 22, 2023
Efficient Transformer Knowledge Distillation: A Performance ReviewNathan Brown, Ashton Williamson, Tahj Anderson et al.
As pretrained transformer language models continue to achieve state-of-the-art performance, the Natural Language Processing community has pushed for advances in model compression and efficient attention mechanisms to address high computational requirements and limited input sequence length. Despite these separate efforts, no investigation has been done into the intersection of these two fields. In this work, we provide an evaluation of model compression via knowledge distillation on efficient attention transformers. We provide cost-performance trade-offs for the compression of state-of-the-art efficient attention architectures and the gains made in performance in comparison to their full attention counterparts. Furthermore, we introduce a new long-context Named Entity Recognition dataset, GONERD, to train and test the performance of NER models on long sequences. We find that distilled efficient attention transformers can preserve a significant amount of original model performance, preserving up to 98.6% across short-context tasks (GLUE, SQUAD, CoNLL-2003), up to 94.6% across long-context Question-and-Answering tasks (HotpotQA, TriviaQA), and up to 98.8% on long-context Named Entity Recognition (GONERD), while decreasing inference times by up to 57.8%. We find that, for most models on most tasks, performing knowledge distillation is an effective method to yield high-performing efficient attention models with low costs.
QMNov 22, 2018Code
GuacaMol: Benchmarking Models for De Novo Molecular DesignNathan Brown, Marco Fiscato, Marwin H. S. Segler et al.
De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimization tasks. The benchmarking open-source Python code, and a leaderboard can be found on https://benevolent.ai/guacamol
CEDec 16, 2025
A Survey of AI Methods for Geometry Preparation and Mesh Generation in Engineering SimulationSteven Owen, Nathan Brown, Nikos Chrisochoides et al.
Artificial intelligence is beginning to reduce the manual effort in the CAD-to-mesh pipeline. Written for meshing and geometry practitioners with limited AI background, this survey organizes recent work by workflow step. We cover part classification and segmentation, mesh quality prediction, and defeaturing. We review AI guidance for unstructured meshing, block-structured meshing in 2D and 3D, and volumetric parameterization, including reconstruction from implicit or sampled geometry. We also discuss parallel mesh generation and scripting automation via reinforcement learning and large language models. Across these topics, AI complements established geometry and meshing algorithms rather than replacing them. We conclude with practical lessons and open challenges in data, benchmarks, and trustworthy integration.