Rob Knight

QMNov 30, 2020

Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data

Lingjing Jiang, Niina Haiminen, Anna-Paola Carrieri et al.

Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high-dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the training data would lead to large changes in the chosen feature subset, then many of the biological features that an algorithm has found are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metric MSE and proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications. We conclude that Stability is a preferred feature selection criterion over MSE because it better quantifies the reproducibility of the feature selection method.

HCSep 2, 2016

Integrating citizen science with online learning to ask better questions

Vineet Pandey, Scott Klemmer, Amnon Amir et al.

Online learners spend millions of hours per year testing their new skills on assignments with known answers. This paper explores whether framing research questions as assignments with unknown answers helps learners generate novel, useful, and difficult-to-find knowledge while increasing their motivation by contributing to a larger goal. Collaborating with the American Gut Project, the world's largest crowdfunded citizen science project, we deploy Gut Instinct to allow novices to generate hypotheses about the constitution of the human gut microbiome. The tool enables online learners to explore learning material about the microbiome and create their own theories around causal variances for microbiome. Building on crowdsourcing or serious games that use people as replaceable units, this work-in-progress lays our plans for how people (a) use their personal knowledge (b) towards solving a larger real-world goal (c) that can provide potential benefits to them. We hope to demonstrate that Gut Instinct citizen scientists generate useful hypotheses, perform better on learning tasks than traditional MOOC learners, and are better engaged with the learning material.

Rob Knight

2 Papers