C. Jess Riedel

CLSep 28, 2021

RAFT: A Real-World Few-Shot Text Classification Benchmark

Neel Alex, Eli Lifland, Lewis Tunstall et al.

Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? Existing benchmarks are not designed to measure progress in applied settings, and so don't directly answer this question. The RAFT benchmark (Real-world Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment. Baseline evaluations on RAFT reveal areas current techniques struggle with: reasoning over long texts and tasks with many classes. Human baselines show that some classification tasks are difficult for non-expert humans, reflecting that real-world value sometimes depends on domain expertise. Yet even non-expert human baseline F1 scores exceed GPT-3 by an average of 0.11. The RAFT datasets and leaderboard will track which model improvements translate into real-world benefits at https://raft.elicit.org .

QUANT-PHMar 29, 2013

On the security of key distribution based on Johnson-Nyquist noise

Charles H. Bennett, C. Jess Riedel

We point out that arguments for the security of Kish's noise-based cryptographic protocol have relied on an unphysical no-wave limit, which if taken seriously would prevent any correlation from developing between the users. We introduce a noiseless version of the protocol, also having illusory security in the no-wave limit, to show that noise and thermodynamics play no essential role. Then we prove generally that classical electromagnetic protocols cannot establish a secret key between two parties separated by a spacetime region perfectly monitored by an eavesdropper. We note that the original protocol of Kish is vulnerable to passive time-correlation attacks even in the quasi-static limit. Finally we show that protocols of this type can be secure in practice against an eavesdropper with noisy monitoring equipment. In this case the security is a straightforward consequence of Maurer and Wolf's discovery that key can be distilled by public discussion from correlated random variables in a wide range of situations where the eavesdropper's noise is at least partly independent from the users' noise.

C. Jess Riedel

2 Papers