The Fact Extraction and VERification (FEVER) Shared Task
This work addresses the challenge of automated fact-checking for general knowledge, but it is incremental as it builds on existing shared task frameworks.
The paper tackled the problem of verifying human-written factoid claims against Wikipedia evidence, with the best system achieving a FEVER score of 64.21%.
We present the results of the first Fact Extraction and VERification (FEVER) Shared Task. The task challenged participants to classify whether human-written factoid claims could be Supported or Refuted using evidence retrieved from Wikipedia. We received entries from 23 competing teams, 19 of which scored higher than the previously published baseline. The best performing system achieved a FEVER score of 64.21%. In this paper, we present the results of the shared task and a summary of the systems, highlighting commonalities and innovations among participating systems.