SDAIDCDLLGASSep 7, 2023

Large-Scale Automatic Audiobook Creation

MicrosoftUW
arXiv:2309.03926v12 citationsh-index: 58
Originality Incremental advance
AI Analysis

This work addresses the accessibility and engagement challenges for readers by providing a scalable solution to audiobook creation, though it is incremental as it builds on existing neural text-to-speech technology.

The authors tackled the problem of creating audiobooks from e-books, which traditionally requires extensive human effort, by developing an automated system that generated over five thousand high-quality, open-license audiobooks from the Project Gutenberg collection.

An audiobook can dramatically improve a work of literature's accessibility and improve reader engagement. However, audiobooks can take hundreds of hours of human effort to create, edit, and publish. In this work, we present a system that can automatically generate high-quality audiobooks from online e-books. In particular, we leverage recent advances in neural text-to-speech to create and release thousands of human-quality, open-license audiobooks from the Project Gutenberg e-book collection. Our method can identify the proper subset of e-book content to read for a wide collection of diversely structured books and can operate on hundreds of books in parallel. Our system allows users to customize an audiobook's speaking speed and style, emotional intonation, and can even match a desired voice using a small amount of sample audio. This work contributed over five thousand open-license audiobooks and an interactive demo that allows users to quickly create their own customized audiobooks. To listen to the audiobook collection visit \url{https://aka.ms/audiobook}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes