2.2DCApr 17
Compositional Design, Implementation, and Verification of Swarms (Technical Report)Florian Furbach, Lucas Clorius, Roland Kuhn et al.
Swarm protocols are a recently introduced formalism for specifying, implementing, and verifying peer-to-peer systems called swarms. A swarm consists of distributed agents called machines that communicate by asynchronous event propagation. Following a local-first model, each machine can progress without requiring continuous connectivity to other machines. Existing models of swarms are not compositional, making the modular development of large and complex swarm applications as well as the reuse of code difficult. We address these issues by presenting novel theory and techniques for the compositional specification, verification, and implementation of swarms. These results enable the correct compositional reuse of pre-existing swarm protocols and machine implementations. We implement these contributions in a companion software artifact which enables the automatic integration of independently designed and verified swarm components.
SDJul 14, 2025
Supporting SENCOTEN Language Documentation Efforts with Automatic Speech RecognitionMengzhe Geng, Patrick Littell, Aidan Pine et al.
The SENCOTEN language, spoken on the Saanich peninsula of southern Vancouver Island, is in the midst of vigorous language revitalization efforts to turn the tide of language loss as a result of colonial language policies. To support these on-the-ground efforts, the community is turning to digital technology. Automatic Speech Recognition (ASR) technology holds great promise for accelerating language documentation and the creation of educational resources. However, developing ASR systems for SENCOTEN is challenging due to limited data and significant vocabulary variation from its polysynthetic structure and stress-driven metathesis. To address these challenges, we propose an ASR-driven documentation pipeline that leverages augmented speech data from a text-to-speech (TTS) system and cross-lingual transfer learning with Speech Foundation Models (SFMs). An n-gram language model is also incorporated via shallow fusion or n-best restoring to maximize the use of available data. Experiments on the SENCOTEN dataset show a word error rate (WER) of 19.34% and a character error rate (CER) of 5.09% on the test set with a 57.02% out-of-vocabulary (OOV) rate. After filtering minor cedilla-related errors, WER improves to 14.32% (26.48% on unseen words) and CER to 3.45%, demonstrating the potential of our ASR-driven pipeline to support SENCOTEN language documentation.
CLJun 7, 2018
A Challenge Set for French --> English Machine TranslationPierre Isabelle, Roland Kuhn
We present a challenge set for French --> English machine translation based on the approach introduced in Isabelle, Cherry and Foster (EMNLP 2017). Such challenge sets are made up of sentences that are expected to be relatively difficult for machines to translate correctly because their most straightforward translations tend to be linguistically divergent. We present here a set of 506 manually constructed French sentences, 307 of which are targeted to the same kinds of structural divergences as in the paper mentioned above. The remaining 199 sentences are designed to test the ability of the systems to correctly translate difficult grammatical words such as prepositions. We report on the results of using this challenge set for testing two different systems, namely Google Translate and DEEPL, each on two different dates (October 2017 and January 2018). All the resulting data are made publicly available.