CLApr 13

SHARE: Social-Humanities AI for Research and Education

João Gonçalves, Sonia de Jager, Petr Knoth, David Pride, Nick Jelicic

arXiv:2604.1115271.2h-index: 10

Predicted impact top 31% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This work provides domain-specific language models and a non-generative interface for SSH researchers, addressing the need for AI tools that respect disciplinary norms, though the performance gains are incremental compared to general models.

The authors present SHARE, the first causal language models pretrained specifically for social sciences and humanities (SSH), achieving performance close to general-purpose models like Phi-4 while using 100 times fewer tokens. They also introduce MIRROR, a user interface that reviews text inputs without generating text, aiming to align AI with SSH principles.

This intermediate technical report introduces the SHARE family of base models and the MIRROR user interface. The SHARE models are the first causal language models fully pretrained by and for the social sciences and humanities (SSH). Their performance in modelling SSH texts is close to that of general purpose models (Phi-4) which use 100 times more tokens, as shown by our custom SSH Cloze benchmark. The MIRROR user interface is designed for reviewing text inputs from the SSH disciplines while preserving critical engagement. By prototyping a generative AI interface that does not generate any text, we propose a way to harness the capabilities of the SHARE models without compromising the integrity of SSH principles and norms.

View on arXiv PDF

Similar