CLFeb 14, 2018

Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur

arXiv:1802.05092v12.034 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of language documentation and processing for unwritten languages, but it is incremental as it summarizes a workshop rather than presenting new results.

The paper tackled the problem of discovering linguistic units like subwords and words in unwritten languages by using multi-modal inputs such as images or translated text instead of orthographic transcriptions, resulting in a summary of workshop accomplishments that explored computational methods for unsupervised discovery from raw speech.

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the replacement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsupervised discovery from raw speech.

View on arXiv PDF

Similar