Word-level Persian Lipreading Dataset
This provides a domain-specific resource for Persian lipreading, but it is incremental as it applies existing methods to new data.
The authors tackled the lack of a suitable dataset for Persian lipreading by creating a new in-the-wild dataset with 244,000 videos from 1,800 speakers, and they achieved significantly better performance using the AV-HuBERT model for feature extraction.
Lip-reading has made impressive progress in recent years, driven by advances in deep learning. Nonetheless, the prerequisite such advances is a suitable dataset. This paper provides a new in-the-wild dataset for Persian word-level lipreading containing 244,000 videos from approximately 1,800 speakers. We evaluated the state-of-the-art method in this field and used a novel approach for word-level lip-reading. In this method, we used the AV-HuBERT model for feature extraction and obtained significantly better performance on our dataset.