Oliver Johnson

IT
4papers
320citations
Novelty45%
AI Score44

4 Papers

57.7ITMay 14
Group Testing: An Information Theory Perspective

Matthew Aldridge, Oliver Johnson, Jonathan Scarlett

The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information technology, data science, and more. In this monograph, we survey recent developments in the group testing problem from an information-theoretic perspective. We cover several related developments: efficient algorithms with practical storage and computation requirements, achievability bounds for optimal decoding methods, and algorithm-independent converse bounds. We assess the theoretical guarantees not only in terms of scaling laws, but also in terms of the constant factors, leading to the notion of the {\em rate} of group testing, indicating the amount of information learned per test. For the noiseless setting, we present a series of results leading to optimal rates, which in turn imply optimality and suboptimality results of various algorithms depending on the sparsity regime. We also survey analogous developments in noisy settings. In addition, we survey results concerning a number of variations on the standard group testing problem, including approximate recovery criteria, adaptive algorithms with a limited number of stages, sublinear-time algorithms, and settings with additional prior information, among others.

CLDec 15, 2022
Saved You A Click: Automatically Answering Clickbait Titles

Oliver Johnson, Beicheng Lou, Janet Zhong et al.

Often clickbait articles have a title that is phrased as a question or vague teaser that entices the user to click on the link and read the article to find the explanation. We developed a system that will automatically find the answer or explanation of the clickbait hook from the website text so that the user does not need to read through the text themselves. We fine-tune an extractive question and answering model (RoBERTa) and an abstractive one (T5), using data scraped from the 'StopClickbait' Facebook pages and Reddit's 'SavedYouAClick' subforum. We find that both extractive and abstractive models improve significantly after finetuning. We find that the extractive model performs slightly better according to ROUGE scores, while the abstractive one has a slight edge in terms of BERTscores.

12.0ITApr 15
A discrete Benamou-Brenier formulation of Optimal Transport on graphs

Kieran Morris, Oliver Johnson

We propose a discrete transport equation on graphs which connects distributions on both vertices and edges. We then derive a discrete analogue of the Benamou-Brenier formulation for Wasserstein-$1$ distance on a graph and as a result classify all $W_1$ geodesics on graphs.

ITJun 14, 2017
A strong converse bound for multiple hypothesis testing, with applications to high-dimensional estimation

Ramji Venkataramanan, Oliver Johnson

In statistical inference problems, we wish to obtain lower bounds on the minimax risk, that is to bound the performance of any possible estimator. A standard technique to obtain risk lower bounds involves the use of Fano's inequality. In an information-theoretic setting, it is known that Fano's inequality typically does not give a sharp converse result (error lower bound) for channel coding problems. Moreover, recent work has shown that an argument based on binary hypothesis testing gives tighter results. We adapt this technique to the statistical setting, and argue that Fano's inequality can always be replaced by this approach to obtain tighter lower bounds that can be easily computed and are asymptotically sharp. We illustrate our technique in three applications: density estimation, active learning of a binary classifier, and compressed sensing, obtaining tighter risk lower bounds in each case.