CVApr 8, 2023

Word-level Persian Lipreading Dataset

Javad Peymanfard, Ali Lashini, Samin Heydarian, Hossein Zeinali, Nasser Mozayani

arXiv:2304.04068v15.07 citationsh-index: 18

Originality Synthesis-oriented

AI Analysis

This provides a domain-specific resource for Persian lipreading, but it is incremental as it applies existing methods to new data.

The authors tackled the lack of a suitable dataset for Persian lipreading by creating a new in-the-wild dataset with 244,000 videos from 1,800 speakers, and they achieved significantly better performance using the AV-HuBERT model for feature extraction.

Lip-reading has made impressive progress in recent years, driven by advances in deep learning. Nonetheless, the prerequisite such advances is a suitable dataset. This paper provides a new in-the-wild dataset for Persian word-level lipreading containing 244,000 videos from approximately 1,800 speakers. We evaluated the state-of-the-art method in this field and used a novel approach for word-level lip-reading. In this method, we used the AV-HuBERT model for feature extraction and obtained significantly better performance on our dataset.

View on arXiv PDF

Similar