HEAD-QA: A Healthcare Dataset for Complex Reasoning
This provides a benchmark for advancing complex reasoning in AI, particularly in healthcare, though it is incremental as it focuses on dataset creation.
The authors tackled the problem of complex reasoning in question answering by introducing HEAD-QA, a healthcare dataset from Spanish exams, and found that current methods perform poorly, lagging behind human performance.
We present HEAD-QA, a multi-choice question answering testbed to encourage research on complex reasoning. The questions come from exams to access a specialized position in the Spanish healthcare system, and are challenging even for highly specialized humans. We then consider monolingual (Spanish) and cross-lingual (to English) experiments with information retrieval and neural techniques. We show that: (i) HEAD-QA challenges current methods, and (ii) the results lag well behind human performance, demonstrating its usefulness as a benchmark for future work.