DCJan 4
Making MoE based LLM inference resilient with TarragonSongyu Zhang, Aaron Tam, Myungjin Lee et al.
Mixture-of-Experts (MoE) models are increasingly used to serve LLMs at scale, but failures become common as deployment scale grows. Existing systems exhibit poor failure resilience: even a single worker failure triggers a coarse-grained, service-wide restart, discarding accumulated progress and halting the entire inference pipeline during recovery--an approach clearly ill-suited for latency-sensitive, LLM services. We present Tarragon, a resilient MoE inference framework that confines the failures impact to individual workers while allowing the rest of the pipeline to continue making forward progress. Tarragon exploits the natural separation between the attention and expert computation in MoE-based transformers, treating attention workers (AWs) and expert workers (EWs) as distinct failure domains. Tarragon introduces a reconfigurable datapath to mask failures by rerouting requests to healthy workers. On top of this datapath, Tarragon implements a self-healing mechanism that relaxes the tightly synchronized execution of existing MoE frameworks. For stateful AWs, Tarragon performs asynchronous, incremental KV cache checkpointing with per-request restoration, and for stateless EWs, it leverages residual GPU memory to deploy shadow experts. These together keep recovery cost and recomputation overhead extremely low. Our evaluation shows that, compared to state-of-the-art MegaScale-Infer, Tarragon reduces failure-induced stalls by 160-213x (from ~64 s down to 0.3-0.4 s) while preserving performance when no failures occur.
CYDec 6, 2019
An Algorithmic Equity Toolkit for Technology Audits by Community Advocates and ActivistsMichael Katell, Meg Young, Bernease Herman et al.
A wave of recent scholarship documenting the discriminatory harms of algorithmic systems has spurred widespread interest in algorithmic accountability and regulation. Yet effective accountability and regulation is stymied by a persistent lack of resources supporting public understanding of algorithms and artificial intelligence. Through interactions with a US-based civil rights organization and their coalition of community organizations, we identify a need for (i) heuristics that aid stakeholders in distinguishing between types of analytic and information systems in lay language, and (ii) risk assessment tools for such systems that begin by making algorithms more legible. The present work delivers a toolkit to achieve these aims. This paper both presents the Algorithmic Equity Toolkit (AEKit) Equity as an artifact, and details how our participatory process shaped its design. Our work fits within human-computer interaction scholarship as a demonstration of the value of HCI methods and approaches to problems in the area of algorithmic transparency and accountability.
HCFeb 15, 2013
An Online Environment for Democratic Deliberation: Motivations, Principles, and DesignTodd Davies, Brendan O'Connor, Alex Cochran et al.
We have created a platform for online deliberation called Deme (which rhymes with 'team'). Deme is designed to allow groups of people to engage in collaborative drafting, focused discussion, and decision making using the Internet. The Deme project has evolved greatly from its beginning in 2003. This chapter outlines the thinking behind Deme's initial design: our motivations for creating it, the principles that guided its construction, and its most important design features. The version of Deme described here was written in PHP and was deployed in 2004 and used by several groups (including organizers of the 2005 Online Deliberation Conference). Other papers describe later developments in the Deme project (see Davies et al. 2005, 2008; Davies and Mintz 2009).
HCFeb 14, 2013
Displaying Asynchronous Reactions to a Document: Two Goals and a DesignTodd Davies, Benjamin Newman, Brendan O'Connor et al.
We describe and motivate three goals for the screen display of asynchronous text deliberation pertaining to a document: (1) visibility of relationships between comments and the text they reference, between different comments, and between group members and the document and discussion, and (2) distinguishability of boundaries between contextually related and unrelated text and comments and between individual authors of documents and comments. Interfaces for document-centered discussion generally fail to fulfill one or both of these goals as well as they could. We describe the design of the new version of Deme, a Web-based platform for online deliberation, and argue that it achieves the two goals better than other recent designs.