APJun 17, 2022
Scaling multi-species occupancy models to large citizen science datasetsMartin Ingram, Damjan Vukcevic, Nick Golding
Citizen science datasets can be very large and promise to improve species distribution modelling, but detection is imperfect, risking bias when fitting models. In particular, observers may not detect species that are actually present. Occupancy models can estimate and correct for this observation process, and multi-species occupancy models exploit similarities in the observation process, which can improve estimates for rare species. However, the computational methods currently used to fit these models do not scale to large datasets. We develop approximate Bayesian inference methods and use graphics processing units (GPUs) to scale multi-species occupancy models to very large citizen science data. We fit multi-species occupancy models to one month of data from the eBird project consisting of 186,811 checklist records comprising 430 bird species. We evaluate the predictions on a spatially separated test set of 59,338 records, comparing two different inference methods -- Markov chain Monte Carlo (MCMC) and variational inference (VI) -- to occupancy models fitted to each species separately using maximum likelihood. We fitted models to the entire dataset using VI, and up to 32,000 records with MCMC. VI fitted to the entire dataset performed best, outperforming single-species models on both AUC (90.4% compared to 88.7%) and on log likelihood (-0.080 compared to -0.085). We also evaluate how well range maps predicted by the model agree with expert maps. We find that modelling the detection process greatly improves agreement and that the resulting maps agree as closely with expert maps as ones estimated using high quality survey data. Our results demonstrate that multi-species occupancy models are a compelling approach to model large citizen science datasets, and that, once the observation process is taken into account, they can model species distributions accurately.
CYMar 19, 2025
3+ Seat Risk-Limiting Audits for Single Transferable Vote ElectionsMichelle Blom, Alexander Ek, Peter J. Stuckey et al.
Constructing efficient risk-limiting audits (RLAs) for multiwinner single transferable vote (STV) elections is a challenging problem. An STV RLA is designed to statistically verify that the reported winners of an election did indeed win according to the voters' expressed preferences and not due to mistabulation or interference, while limiting the risk of accepting an incorrect outcome to a desired threshold (the risk limit). Existing methods have shown that it is possible to form RLAs for two-seat STV elections in the context where the first seat has been awarded to a candidate in the first round of tabulation. This is called the first winner criterion. We present an assertion-based approach to conducting full or partial RLAs for STV elections with three or more seats, in which the first winner criterion is satisfied. Although the chance of forming a full audit that verifies all winners drops substantially as the number of seats increases, we show that we can quite often form partial audits that verify most, and sometimes all, of the reported winners. We evaluate our method on a dataset of over 500 three- and four-seat STV elections from the 2017 and 2022 local council elections in Scotland.
11.2CYMar 26
Doing More With Less: Mismatch-Based Risk-Limiting AuditsAlexander Ek, Michelle Blom, Philip B. Stark et al.
One approach to risk-limiting audits (RLAs) compares randomly selected cast vote records (CVRs) to votes read by human auditors from the corresponding ballot cards. Historically, such methods reduce audit sample sizes by considering how each sampled CVR differs from the corresponding true vote, not merely whether they differ. Here we investigate the latter approach, auditing by testing whether the total number of mismatches in the full set of CVRs exceeds the minimum number of CVR errors required for the reported outcome to be wrong (the "CVR margin"). This strategy makes it possible to audit more social choice functions and simplifies RLAs conceptually, which makes it easier to explain than some other RLA approaches. The cost is larger sample sizes. "Mismatch-based RLAs" only require a lower bound on the CVR margin, which for some social choice functions is easier to calculate than the effect of particular errors. When the population rate of mismatches is low and the lower bound on the CVR margin is close to the true CVR margin, the increase in sample size is small. However, the increase may be very large when errors include errors that, if corrected, would widen the CVR margin rather than narrow it; errors affect the margin between candidates other than the reported winner with the fewest votes and the reported loser with the most votes; or errors that affect different margins.
CYDec 18, 2021
A First Approach to Risk-Limiting Audits for Single Transferable Vote ElectionsMichelle Blom, Peter J. Stuckey, Vanessa Teague et al.
Risk-limiting audits (RLAs) are an increasingly important method for checking that the reported outcome of an election is, in fact, correct. Indeed, their use is increasingly being legislated. While effective methods for RLAs have been developed for many forms of election -- for example: first-past-the-post, instant-runoff voting, and D'Hondt elections -- auditing methods for single transferable vote (STV) elections have yet to be developed. STV elections are notoriously hard to reason about since there is a complex interaction of votes that change their value throughout the process. In this paper we present the first approach to risk-limiting audits for STV elections, restricted to the case of 2-seat STV elections.
CYJul 25, 2021
Assertion-Based Approaches to Auditing Complex Elections, with Application to Party-List Proportional ElectionsMichelle Blom, Jurlind Budurushi, Ronald L. Rivest et al.
Risk-limiting audits (RLAs), an ingredient in evidence-based elections, are increasingly common. They are a rigorous statistical means of ensuring that electoral results are correct, usually without having to perform an expensive full recount -- at the cost of some controlled probability of error. A recently developed approach for conducting RLAs, SHANGRLA, provides a flexible framework that can encompass a wide variety of social choice functions and audit strategies. Its flexibility comes from reducing sufficient conditions for outcomes to be correct to canonical `assertions' that have a simple mathematical form. Assertions have been developed for auditing various social choice functions including plurality, multi-winner plurality, super-majority, Hamiltonian methods, and instant runoff voting. However, there is no systematic approach to building assertions. Here, we show that assertions with linear dependence on transformations of the votes can easily be transformed to canonical form for SHANGRLA. We illustrate the approach by constructing assertions for party-list elections such as Hamiltonian free list elections and elections using the D'Hondt method, expanding the set of social choice functions to which SHANGRLA applies directly.
CYFeb 17, 2021
Auditing Hamiltonian ElectionsMichelle Blom, Philip B. Stark, Peter J. Stuckey et al.
Presidential primaries are a critical part of the United States Presidential electoral process, since they are used to select the candidates in the Presidential election. While methods differ by state and party, many primaries involve proportional delegate allocation using the so-called Hamilton method. In this paper we show how to conduct risk-limiting audits for delegate allocation elections using variants of the Hamilton method where the viability of candidates is determined either by a plurality vote or using instant runoff voting. Experiments on real-world elections show that we can audit primary elections to high confidence (small risk limits) usually at low cost.
APAug 19, 2020
A Unified Evaluation of Two-Candidate Ballot-Polling Election Auditing MethodsZhuoqun Huang, Ronald L. Rivest, Philip B. Stark et al.
Counting votes is complex and error-prone. Several statistical methods have been developed to assess election accuracy by manually inspecting randomly selected physical ballots. Two 'principled' methods are risk-limiting audits (RLAs) and Bayesian audits (BAs). RLAs use frequentist statistical inference while BAs are based on Bayesian inference. Until recently, the two have been thought of as fundamentally different. We present results that unify and shed light upon 'ballot-polling' RLAs and BAs (which only require the ability to sample uniformly at random from all cast ballot cards) for two-candidate plurality contests, which are building blocks for auditing more complex social choice functions, including some preferential voting systems. We highlight the connections between the methods and explore their performance. First, building on a previous demonstration of the mathematical equivalence of classical and Bayesian approaches, we show that BAs, suitably calibrated, are risk-limiting. Second, we compare the efficiency of the methods across a wide range of contest sizes and margins, focusing on the distribution of sample sizes required to attain a given risk limit. Third, we outline several ways to improve performance and show how the mathematical equivalence explains the improvements.