Elsevier Arena: Human Evaluation of Chemistry/Biology/Health Foundational Large Language Models
Camilo Thorne, Christian Druckenbrodt, Kinga Szarkowska, Deepika Goyal, Pranita Marajan, Vijay Somanath, Corey Harper, Mao Yan, Tony Scerri
arXiv:2409.05486v2
Originality Synthesis-oriented
AI Analysis
This work would have addressed the need for human evaluation of AI models in scientific domains, but it is incomplete and incremental due to the removal.
The paper aimed to evaluate foundational large language models in chemistry, biology, and health, but it was removed from arXiv due to licensing issues, so no results or numbers are available.
arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission