Philip B. Stark

CR
14papers
195citations
Novelty40%
AI Score40

14 Papers

11.3CYMar 26
Doing More With Less: Mismatch-Based Risk-Limiting Audits

Alexander Ek, Michelle Blom, Philip B. Stark et al.

One approach to risk-limiting audits (RLAs) compares randomly selected cast vote records (CVRs) to votes read by human auditors from the corresponding ballot cards. Historically, such methods reduce audit sample sizes by considering how each sampled CVR differs from the corresponding true vote, not merely whether they differ. Here we investigate the latter approach, auditing by testing whether the total number of mismatches in the full set of CVRs exceeds the minimum number of CVR errors required for the reported outcome to be wrong (the "CVR margin"). This strategy makes it possible to audit more social choice functions and simplifies RLAs conceptually, which makes it easier to explain than some other RLA approaches. The cost is larger sample sizes. "Mismatch-based RLAs" only require a lower bound on the CVR margin, which for some social choice functions is easier to calculate than the effect of particular errors. When the population rate of mismatches is low and the lower bound on the CVR margin is close to the true CVR margin, the increase in sample size is small. However, the increase may be very large when errors include errors that, if corrected, would widen the CVR margin rather than narrow it; errors affect the margin between candidates other than the reported winner with the fewest votes and the reported loser with the most votes; or errors that affect different margins.

CYApr 1, 2020Code
You can do RLAs for IRV

Michelle Blom, Andrew Conway, Dan King et al.

The City and County of San Francisco, CA, has used Instant Runoff Voting (IRV) for some elections since 2004. This report describes the first ever process pilot of Risk Limiting Audits for IRV, for the San Francisco District Attorney's race in November, 2019. We found that the vote-by-mail outcome could be efficiently audited to well under the 0.05 risk limit given a sample of only 200 ballots. All the software we developed for the pilot is open source.

APNov 22, 2019Code
Sets of Half-Average Nulls Generate Risk-Limiting Audits: SHANGRLA

Philip B. Stark

Risk-limiting audits (RLAs) for many social choice functions can be reduced to testing sets of null hypotheses of the form "the average of this list is not greater than 1/2" for a collection of finite lists of nonnegative numbers. Such social choice functions include majority, super-majority, plurality, multi-winner plurality, Instant Runoff Voting (IRV), Borda count, approval voting, and STAR-Voting, among others. The audit stops without a full hand count iff all the null hypotheses are rejected. The nulls can be tested in many ways. Ballot-polling is particularly simple; two new ballot-polling risk-measuring functions for sampling without replacement are given. Ballot-level comparison audits transform each null into an equivalent assertion that the mean of re-scaled tabulation errors is not greater than 1/2. In turn, that null can then be tested using the same statistical methods used for ballot polling---but applied to different finite lists of nonnegative numbers. SHANGRLA comparison audits are more efficient than previous comparison audits for two reasons: (i) for most social choice functions, the conditions tested are both necessary and sufficient for the reported outcome to be correct, while previous methods tested conditions that were sufficient but not necessary, and (ii) the tests avoid a conservative approximation. The SHANGRLA abstraction simplifies stratified audits, including audits that combine ballot polling with ballot-level comparisons, producing sharper audits than the "SUITE" approach. SHANGRLA works with the "phantoms to evil zombies" strategy to treat missing ballot cards and missing or redacted cast vote records. That also facilitates sampling from "ballot-style manifests," which can dramatically improve efficiency when the audited contests do not appear on every ballot card. Open-source software implementing SHANGRLA ballot-level comparison audits is available.

MEJan 7, 2022
ALPHA: Audit that Learns from Previously Hand-Audited Ballots

Philip B. Stark

BRAVO, the most widely tried method for risk-limiting election audits, cannot accommodate sampling without replacement or stratified sampling, which can improve efficiency and may be required by law. It applies only to ballot-polling audits, which are less efficient than comparison audits. It applies to plurality, majority, super-majority, proportional representation, and ranked-choice voting contests, but not to many social choice functions for which there are RLA methods, such as approval voting, STAR-voting, Borda count, and general scoring rules. And while BRAVO has the smallest expected sample size among sequentially valid ballot-polling-with-replacement methods when reported vote shares are exactly right, it can require arbitrarily large samples when the reported reported winner(s) really won but reported vote shares are wrong. ALPHA is a simple generalization of BRAVO that (i) works for sampling with and without replacement and Bernoulli sampling; (ii) increases power for stratified audits by avoiding the need to use a $P$-value combining function or to maximize $P$-values over nuisance parameters within strata, and allowing adaptive sampling across strata; (iii) works not only for ballot-polling but also for ballot-level comparison, batch-polling, and batch-level comparison audits, sampling with or without replacement, uniformly or with weights proportional to size; (iv) works for all social choice functions covered by SHANGRLA; and (v) in situations where both ALPHA and BRAVO apply, requires smaller samples than BRAVO when the reported vote shares are wrong but the outcome is correct--five orders of magnitude in some examples. ALPHA includes the family of betting martingale tests in RiLACS, with a different betting strategy parametrized as an estimator of the population mean and explicit flexibility to accommodate sampling weights and population bounds that vary by draw.

CYJul 25, 2021
Assertion-Based Approaches to Auditing Complex Elections, with Application to Party-List Proportional Elections

Michelle Blom, Jurlind Budurushi, Ronald L. Rivest et al.

Risk-limiting audits (RLAs), an ingredient in evidence-based elections, are increasingly common. They are a rigorous statistical means of ensuring that electoral results are correct, usually without having to perform an expensive full recount -- at the cost of some controlled probability of error. A recently developed approach for conducting RLAs, SHANGRLA, provides a flexible framework that can encompass a wide variety of social choice functions and audit strategies. Its flexibility comes from reducing sufficient conditions for outcomes to be correct to canonical `assertions' that have a simple mathematical form. Assertions have been developed for auditing various social choice functions including plurality, multi-winner plurality, super-majority, Hamiltonian methods, and instant runoff voting. However, there is no systematic approach to building assertions. Here, we show that assertions with linear dependence on transformations of the votes can easily be transformed to canonical form for SHANGRLA. We illustrate the approach by constructing assertions for party-list elections such as Hamiltonian free list elections and elections using the D'Hondt method, expanding the set of social choice functions to which SHANGRLA applies directly.

CYFeb 17, 2021
Auditing Hamiltonian Elections

Michelle Blom, Philip B. Stark, Peter J. Stuckey et al.

Presidential primaries are a critical part of the United States Presidential electoral process, since they are used to select the candidates in the Presidential election. While methods differ by state and party, many primaries involve proportional delegate allocation using the so-called Hamilton method. In this paper we show how to conduct risk-limiting audits for delegate allocation elections using variants of the Hamilton method where the viability of candidates is determined either by a plurality vote or using instant runoff voting. Experiments on real-world elections show that we can audit primary elections to high confidence (small risk limits) usually at low cost.

CRDec 6, 2020
More style, less work: card-style data decrease risk-limiting audit sample sizes

Amanda K. Glazer, Jacob V. Spertus, Philip B. Stark

U.S. elections rely heavily on computers such as voter registration databases, electronic pollbooks, voting machines, scanners, tabulators, and results reporting websites. These introduce digital threats to election outcomes. Risk-limiting audits (RLAs) mitigate threats to some of these systems by manually inspecting random samples of ballot cards. RLAs have a large chance of correcting wrong outcomes (by conducting a full manual tabulation of a trustworthy record of the votes), but can save labor when reported outcomes are correct. This efficiency is eroded when sampling cannot be targeted to ballot cards that contain the contest(s) under audit. If the sample is drawn from all cast cards, RLA sample sizes scale like the reciprocal of the fraction of ballot cards that contain the contest(s) under audit. That fraction shrinks as the number of cards per ballot grows (i.e., when elections contain more contests) and as the fraction of ballots that contain the contest decreases (i.e., when a smaller percentage of voters are eligible to vote in the contest). States that conduct RLAs of contests on multi-card ballots or of small contests can dramatically reduce sample sizes by using information about which ballot cards contain which contests -- by keeping track of card-style data (CSD). For instance, CSD reduces the expected number of draws needed to audit a single countywide contest on a 4-card ballot by 75%. Similarly, CSD reduces the expected number of draws by 95% or more for an audit of two contests with the same margin on a 4-card ballot if one contest is on every ballot and the other is on 10% of ballots. In realistic examples, the savings can be several orders of magnitude.

APAug 19, 2020
A Unified Evaluation of Two-Candidate Ballot-Polling Election Auditing Methods

Zhuoqun Huang, Ronald L. Rivest, Philip B. Stark et al.

Counting votes is complex and error-prone. Several statistical methods have been developed to assess election accuracy by manually inspecting randomly selected physical ballots. Two 'principled' methods are risk-limiting audits (RLAs) and Bayesian audits (BAs). RLAs use frequentist statistical inference while BAs are based on Bayesian inference. Until recently, the two have been thought of as fundamentally different. We present results that unify and shed light upon 'ballot-polling' RLAs and BAs (which only require the ability to sample uniformly at random from all cast ballot cards) for two-candidate plurality contests, which are building blocks for auditing more complex social choice functions, including some preferential voting systems. We highlight the connections between the methods and explore their performance. First, building on a previous demonstration of the mathematical equivalence of classical and Bayesian approaches, we show that BAs, suitably calibrated, are risk-limiting. Second, we compare the efficiency of the methods across a wide range of contest sizes and margins, focusing on the distribution of sample sizes required to attain a given risk limit. Third, we outline several ways to improve performance and show how the mathematical equivalence explains the improvements.

APAug 21, 2019
They may look and look, yet not see: BMDs cannot be tested adequately

Philip B. Stark, Ran Xie

Bugs, misconfiguration, and malware can cause ballot-marking devices (BMDs) to print incorrect votes. Several approaches to testing BMDs have been proposed. In logic and accuracy testing (LAT) and parallel or live testing, auditors input known test votes into the BMD and check the printout. Passive testing monitors the rate of "spoiled" BMD printout, on the theory that if BMDs malfunction, the rate will increase noticeably. We show that these approaches cannot reliably detect outcome-altering problems, because: (i) The number of possible interactions with BMDs is enormous, so testing interactions uniformly at random is hopeless. (ii) To probe the space of interactions intelligently requires an accurate model of voter behavior, but because the space of interactions is so large, building an accurate model requires observing a huge number of voters in every jurisdiction in every election--more voters than there are in most jurisdictions. (iii) Even with a perfect model of voter behavior, the number of tests needed exceeds the number of voters in most jurisdictions. (iv) An attacker can target interactions that are expensive to test, e.g., because they involve voting slowly; or interactions for which tampering is less likely to be noticed, e.g., because the voter uses the audio interface. (v) Whether BMDs misbehave or not, the distribution of spoiled ballots is unknown and varies by election and possibly by ballot style: historical data do not help much. Hence, there is no way to calibrate a threshold for passive testing, e.g., to guarantee at least a 95% chance of noticing that 5% of the votes were altered, with at most a 5% false alarm rate. (vi) Even if the distribution of spoiled ballots were known to be Poisson, the vast majority of jurisdictions do not have enough voters for passive testing to have a large chance of detecting problems but only a small chance of false alarms.

CRAug 14, 2019
Risk-Limiting Tallies

Wojciech Jamroga, Peter B. Roenne, Peter Y. A. Ryan et al.

Many voter-verifiable, coercion-resistant schemes have been proposed, but even the most carefully designed systems necessarily leak information via the announced result. In corner cases, this may be problematic. For example, if all the votes go to one candidate then all vote privacy evaporates. The mere possibility of candidates getting no or few votes could have implications for security in practice: if a coercer demands that a voter cast a vote for such an unpopular candidate, then the voter may feel obliged to obey, even if she is confident that the voting system satisfies the standard coercion resistance definitions. With complex ballots, there may also be a danger of "Italian" style (aka "signature") attacks: the coercer demands the voter cast a ballot with a specific, identifying pattern. Here we propose an approach to tallying end-to-end verifiable schemes that avoids revealing all the votes but still achieves whatever confidence level in the announced result is desired. Now a coerced voter can claim that the required vote must be amongst those that remained shrouded. Our approach is based on the well-established notion of Risk-Limiting Audits, but here applied to the tally rather than to the audit. We show that this approach counters coercion threats arising in extreme tallies and "Italian" attacks. We illustrate our approach by applying it to the Selene scheme, and we extend the approach to Risk-Limiting Verification, where not all vote trackers are revealed, thereby enhancing the coercion mitigation properties of Selene.

MEJun 2, 2019
Confidence Intervals for Selected Parameters

Yoav Benjamini, Yotam Hechtlinger, Philip B. Stark

Practical or scientific considerations often lead to selecting a subset of parameters as ``important.'' Inferences about those parameters often are based on the same data used to select them in the first place. That can make the reported uncertainties deceptively optimistic: confidence intervals that ignore selection generally have less than their nominal coverage probability. Controlling the probability that one or more intervals for selected parameters do not cover---the ``simultaneous over the selected'' (SoS) error rate---is crucial in many scientific problems. Intervals that control the SoS error rate can be constructed in ways that take advantage of knowledge of the selection rule. We construct SoS-controlling confidence intervals for parameters deemed the most ``important'' $k$ of $m$ shift parameters because they are estimated (by independent estimators) to be the largest. The new intervals improve substantially over Šidák intervals when $k$ is small compared to $m$, and approach the standard Bonferroni-corrected intervals when $k \approx m$. Standard, unadjusted confidence intervals for location parameters have the correct coverage probability for $k=1$, $m=2$ if, when the true parameters are zero, the estimators are exchangeable and symmetric.

CRJan 10, 2019
Auditing Indian Elections

Vishal Mohanty, Nicholas Akinyokun, Andrew Conway et al.

Indian Electronic Voting Machines (EVMs) will be fitted with printers that produce Voter-Verifiable Paper Audit Trails (VVPATs) in time for the 2019 general election. VVPATs provide evidence that each vote was recorded as the voter intended, without having to trust the perfection or security of the EVMs. However, confidence in election results requires more: VVPATs must be preserved inviolate and then actually used to check the reported election result in a trustworthy way that the public can verify. A full manual tally from the VVPATs could be prohibitively expensive and time-consuming; moreover, it is difficult for the public to determine whether a full hand count was conducted accurately. We show how Risk-Limiting Audits (RLAs) could provide high confidence in Indian election results. Compared to full hand recounts, RLAs typically require manually inspecting far fewer VVPATs when the outcome is correct, and are much easier for the electorate to observe in adequate detail to determine whether the result is trustworthy.

CRJul 26, 2017
Public Evidence from Secret Ballots

Matthew Bernhard, Josh Benaloh, J. Alex Halderman et al.

Elections seem simple---aren't they just counting? But they have a unique, challenging combination of security and privacy requirements. The stakes are high; the context is adversarial; the electorate needs to be convinced that the results are correct; and the secrecy of the ballot must be ensured. And they have practical constraints: time is of the essence, and voting systems need to be affordable and maintainable, and usable by voters, election officials, and pollworkers. It is thus not surprising that voting is a rich research area spanning theory, applied cryptography, practical systems analysis, usable security, and statistics. Election integrity involves two key concepts: convincing evidence that outcomes are correct and privacy, which amounts to convincing assurance that there is no evidence about how any given person voted. These are obviously in tension. We examine how current systems walk this tightrope.

CROct 1, 2016
Auditing Australian Senate Ballots

Berj Chilingirian, Zara Perumal, Ronald L. Rivest et al.

We explain why the Australian Electoral Commission should perform an audit of the paper Senate ballots against the published preference data files. We suggest four different post-election audit methods appropriate for Australian Senate elections. We have developed prototype code for all of them and tested it on preference data from the 2016 election.