Reactor Mk.1 performances: MMLU, HumanEval and BBH test results
This provides a competitive AI solution for tasks requiring reasoning and difficult jobs, though it appears incremental as it benchmarks an existing model type.
The paper tackles benchmarking the Reactor Mk.1 large language model, showing it outperforms models like GPT-4o with scores of 92% on MMLU, 91% on HumanEval, and 88% on BBH.
The paper presents the performance results of Reactor Mk.1, ARCs flagship large language model, through a benchmarking process analysis. The model utilizes the Lychee AI engine and possesses less than 100 billion parameters, resulting in a combination of efficiency and potency. The Reactor Mk.1 outperformed models such as GPT-4o, Claude Opus, and Llama 3, with achieved scores of 92% on the MMLU dataset, 91% on HumanEval dataset, and 88% on BBH dataset. It excels in both managing difficult jobs and reasoning, establishing as a prominent AI solution in the present cutting-edge AI technology.