AI LGMay 7, 2023

Professional Certification Benchmark Dataset: The First 500 Jobs For Large Language Models

arXiv:2305.05377v15.44 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need to assess AI employable skills for professional certification, though it is incremental as it builds on existing models with a new dataset.

The research tackled the problem of evaluating large language models' vocational readiness by testing GPT-3 and Turbo-GPT3.5 on a benchmark of 1149 professional certifications, finding that GPT-3 passed 39% of certifications without fine-tuning and Turbo-GPT3.5 achieved 100% on the OSCP exam.

The research creates a professional certification survey to test large language models and evaluate their employable skills. It compares the performance of two AI models, GPT-3 and Turbo-GPT3.5, on a benchmark dataset of 1149 professional certifications, emphasizing vocational readiness rather than academic performance. GPT-3 achieved a passing score (>70% correct) in 39% of the professional certifications without fine-tuning or exam preparation. The models demonstrated qualifications in various computer-related fields, such as cloud and virtualization, business analytics, cybersecurity, network setup and repair, and data analytics. Turbo-GPT3.5 scored 100% on the valuable Offensive Security Certified Professional (OSCP) exam. The models also displayed competence in other professional domains, including nursing, licensed counseling, pharmacy, and teaching. Turbo-GPT3.5 passed the Financial Industry Regulatory Authority (FINRA) Series 6 exam with a 70% grade without preparation. Interestingly, Turbo-GPT3.5 performed well on customer service tasks, suggesting potential applications in human augmentation for chatbots in call centers and routine advice services. The models also score well on sensory and experience-based tests such as wine sommelier, beer taster, emotional quotient, and body language reader. The OpenAI model improvement from Babbage to Turbo resulted in a median 60% better-graded performance in less than a few years. This progress suggests that focusing on the latest model's shortcomings could lead to a highly performant AI capable of mastering the most demanding professional certifications. We open-source the benchmark to expand the range of testable professional skills as the models improve or gain emergent capabilities.

View on arXiv PDF

Similar