CLFeb 14, 2018

Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

arXiv:1802.05092v134 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of language documentation and processing for unwritten languages, but it is incremental as it summarizes a workshop rather than presenting new results.

The paper tackled the problem of discovering linguistic units like subwords and words in unwritten languages by using multi-modal inputs such as images or translated text instead of orthographic transcriptions, resulting in a summary of workshop accomplishments that explored computational methods for unsupervised discovery from raw speech.

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the replacement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsupervised discovery from raw speech.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes