CLMar 9, 2023

ChatGPT may Pass the Bar Exam soon, but has a Long Way to Go for the LexGLUE benchmark

arXiv:2304.12202v110.356 citationsh-index: 19Has Code

Originality Synthesis-oriented

AI Analysis

This work assesses ChatGPT's capabilities for legal NLP tasks, highlighting its potential but incremental progress in a domain-specific context.

The study evaluated ChatGPT's zero-shot performance on the LexGLUE legal benchmark, finding an average micro-F1 score of 47.6%, with high scores of up to 70.2% on specific datasets, though overall performance remains limited.

Following the hype around OpenAI's ChatGPT conversational agent, the last straw in the recent development of Large Language Models (LLMs) that demonstrate emergent unprecedented zero-shot capabilities, we audit the latest OpenAI's GPT-3.5 model, `gpt-3.5-turbo', the first available ChatGPT model, in the LexGLUE benchmark in a zero-shot fashion providing examples in a templated instruction-following format. The results indicate that ChatGPT achieves an average micro-F1 score of 47.6% across LexGLUE tasks, surpassing the baseline guessing rates. Notably, the model performs exceptionally well in some datasets, achieving micro-F1 scores of 62.8% and 70.2% in the ECtHR B and LEDGAR datasets, respectively. The code base and model predictions are available for review on https://github.com/coastalcph/zeroshot_lexglue.

View on arXiv PDF Code

Similar